节点调度

Taint

Taint 无非是给集群节点添加属性，防止分配给我们的 pods 不合适。例如，集群中的每个节点 master 都被标记为不接收非集群管理的 pods。节点 master 被标记了污点 NoSchedule，所以 Kubernetes 调度器不会在 master 节点上分配 pod，而会在集群中寻找没有该标记的其他节点。

$ kubectl get nodes

NAME           STATUS   ROLES    AGE     VERSION
elliot-01   Ready    master   7d14h   v1.18.2
elliot-02   Ready    <none>   7d14h   v1.18.2
elliot-03   Ready    <none>   7d14h   v1.18.2

$ kubectl describe node elliot-01 | grep -i taint

Taints: node-role.kubernetes.io/master:NoSchedule

我们将测试一些东西，并允许主节点运行其他 pods。首先我们将运行 3 个 nginx 的副本。

$ kubectl create deployment nginx --image=nginx

deployment.apps/nginx created

$ kubectl get deployments.apps

NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           5s

$ kubectl scale deployment nginx --replicas=3

deployment.apps/nginx scaled

$ kubectl get deployments.apps

NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   3/3     3            3           1m5s

$ kubectl get pods -o wide

NAME                     READY   STATUS    RESTARTS   AGE     IP          NODE               NOMINATED NODE   READINESS GATES
limit-pod                1/1     Running   0          3m44s   10.32.0.4   elliot-02   <none>           <none>
nginx                    1/1     Running   0          25m     10.46.0.1   elliot-03    <none>           <none>
nginx-85f7fb6b45-9bzwc   1/1     Running   0          6m7s    10.32.0.3   elliot-02   <none>           <none>
nginx-85f7fb6b45-cbmtr   1/1     Running   0          6m7s    10.46.0.2   elliot-03    <none>           <none>
nginx-85f7fb6b45-rprz5   1/1     Running   0          6m7s    10.32.0.2   elliot-02   <none>           <none>

我们将把 NoSchedule 这个标签也添加到 worker 节点中，看看它们的表现如何。

$ kubectl taint node elliot-02 key1=value1:NoSchedule

node/elliot-02 tainted

$ kubectl describe node elliot-02 | grep -i taint

Taints:             key1=value1:NoSchedule

$ kubectl taint node elliot-03 key1=value1:NoSchedule

node/elliot-03 tainted

$ kubectl describe node elliot-03 | grep -i taint

Taints:             key1=value1:NoSchedule

现在让我们增加副本的数量：

$ kubectl scale deployment nginx --replicas=5

deployment.apps/nginx scaled

$ kubectl get pods  -o wide

NAME                     READY   STATUS    RESTARTS   AGE     IP          NODE               NOMINATED NODE   READINESS GATES
limit-pod                1/1     Running   0          5m23s   10.32.0.4   elliot-02   <none>           <none>
nginx                    1/1     Running   0          27m     10.46.0.1   elliot-03    <none>           <none>
nginx-85f7fb6b45-9bzwc   1/1     Running   0          7m46s   10.32.0.3   elliot-02   <none>           <none>
nginx-85f7fb6b45-cbmtr   1/1     Running   0          7m46s   10.46.0.2   elliot-03    <none>           <none>
nginx-85f7fb6b45-qnhtl   0/1     Pending   0          18s     <none>      <none>             <none>           <none>
nginx-85f7fb6b45-qsvpp   0/1     Pending   0          18s     <none>      <none>             <none>           <none>
nginx-85f7fb6b45-rprz5   1/1     Running   0          7m46s   10.32.0.2   elliot-02   <none>           <none>

正如我们所看到的，新的复制是孤儿，希望有一个节点出现在 Scheduler 的适当优先级。让我们从我们的 worker 节点中移除这个 taint。

$ kubectl taint node elliot-02 key1:NoSchedule-

node/elliot-02 untainted

$ kubectl taint node elliot-03 key1:NoSchedule-

node/elliot-03 untainted

$ kubectl get pods  -o wide

NAME                     READY   STATUS    RESTARTS   AGE     IP          NODE               NOMINATED NODE   READINESS GATES
limit-pod                1/1     Running   0          6m17s   10.32.0.4   elliot-02          <none>           <none>
nginx                    1/1     Running   0          27m     10.46.0.1   elliot-03          <none>           <none>
nginx-85f7fb6b45-9bzwc   1/1     Running   0          8m40s   10.32.0.3   elliot-02          <none>           <none>
nginx-85f7fb6b45-cbmtr   1/1     Running   0          8m40s   10.46.0.2   elliot-03          <none>           <none>
nginx-85f7fb6b45-qnhtl   1/1     Running   0          72s     10.46.0.5   elliot-03          <none>           <none>
nginx-85f7fb6b45-qsvpp   1/1     Running   0          72s     10.46.0.4   elliot-03          <none>           <none>
nginx-85f7fb6b45-rprz5   1/1     Running   0          8m40s   10.32.0.2   elliot-02          <none>           <none>

有几种类型的标签，我们可以用来对节点进行分类，我们来测试另一个调用 NoExecute，它可以防止 Scheduler 在这些节点上调度 Pod。

$ kubectl taint node elliot-02 key1=value1:NoExecute

node/elliot-02 tainted

$ kubectl taint node elliot-03 key1=value1:NoExecute

node/elliot-03 tainted

$ kubectl get pods

NAME                     READY   STATUS    RESTARTS   AGE
nginx-85f7fb6b45-87sq5   0/1     Pending   0          20s
nginx-85f7fb6b45-8q99g   0/1     Pending   0          20s
nginx-85f7fb6b45-drmzz   0/1     Pending   0          20s
nginx-85f7fb6b45-hb4dp   0/1     Pending   0          20s
nginx-85f7fb6b45-l6zln   0/1     Pending   0          20s

我们可以看到所有的 Pod 都是孤儿。因为节点 master 有 kubernetes 的 NoScheduler 默认的污点标记，而 worker 节点有 NoExecute 的标记。让我们减少副本的数量来看看会发生什么。

$ kubectl scale deployment nginx --replicas=1

deployment.apps/nginx scaled

$ kubectl get pods

nginx-85f7fb6b45-drmzz   0/1     Pending   0          43s

$ kubectl taint node elliot-02 key1:NoExecute-

node/elliot-02 untainted

$ kubectl taint node elliot-03 key1:NoExecute-

node/elliot-03 untainted

$ kubectl get pods

NAME                     READY   STATUS    RESTARTS   AGE
nginx-85f7fb6b45-drmzz   1/1     Running   0          76s

我们现在有一个节点在正常运行。但是如果我们的 Worker 节点不可用，我们可以在主节点上运行 Pods 吗？当然可以，我们将配置我们的主节点，让 Scheduler 可以在它上调度 Pod。

$ kubectl taint nodes --all node-role.kubernetes.io/master-

node/elliot-01 untainted

$ kubectl describe node elliot-01 | grep -i taint

Taints:             <none>

$ kubectl scale deployment nginx --replicas=4

deployment.apps/nginx scaled

$ kubectl get pods -o wide

NAME                     READY   STATUS    RESTARTS   AGE    IP          NODE               NOMINATED NODE   READINESS GATES
nginx-85f7fb6b45-2c6dm   1/1     Running   0          9s     10.32.0.2   elliot-02          <none>           <none>
nginx-85f7fb6b45-4jzcn   1/1     Running   0          9s     10.32.0.3   elliot-02          <none>           <none>
nginx-85f7fb6b45-drmzz   1/1     Running   0          114s   10.46.0.1   elliot-03          <none>           <none>
nginx-85f7fb6b45-rstvq   1/1     Running   0          9s     10.46.0.2   elliot-03          <none>           <none>

让我们在 worker 节点中添加 Taint NoExecute，看看会发生什么。

$ kubectl taint node elliot-02 key1=value1:NoExecute

node/elliot-02 tainted

$ kubectl taint node elliot-03 key1=value1:NoExecute

node/elliot-03 tainted

$ kubectl get pods -o wide

NAME                     READY   STATUS    RESTARTS   AGE   IP          NODE              NOMINATED NODE   READINESS GATES
nginx-85f7fb6b45-49knz   1/1     Running   0          14s   10.40.0.5   elliot-01         <none>           <none>
nginx-85f7fb6b45-4cm9x   1/1     Running   0          14s   10.40.0.4   elliot-01         <none>           <none>
nginx-85f7fb6b45-kppnd   1/1     Running   0          14s   10.40.0.6   elliot-01         <none>           <none>
nginx-85f7fb6b45-rjlmj   1/1     Running   0          14s   10.40.0.3   elliot-01         <none>           <none>

Scheduler 把所有的东西都分配给了节点主控，我们可以看到 Taint 可以用来调整设置哪个 Pod 应该分配给哪个节点。我们将允许我们的 Scheduler 在所有节点上分配和运行 Pod。去除集群中所有节点上的 NoSchedule 污点。

$ kubectl taint node --all key1:NoSchedule-

node/elliot-01 untainted
node/elliot-02 untainted
node/elliot-03 untainted

$ kubectl taint node --all key1:NoExecute-

node/kube-worker1 untainted
node/kube-worker2 untainted
error: taint "key1:NoExecute" not found

将节点设置为维护状态

为了将节点置于维护状态，我们将使用 cordon。

$ kubectl cordon elliot-02

node/elliot-02 cordoned

$ kubectl get nodes

NAME        STATUS                      ROLES    AGE     VERSION
elliot-01   Ready                       master   7d14h   v1.18.2
elliot-02   Ready,SchedulingDisabled    <none>   7d14h   v1.18.2
elliot-03   Ready                       <none>   7d14h   v1.18.2

注意到节点 elliot-02 的状态是 Ready,SchedulingDisabled，现在你可以顺利地对你的节点进行维护了。要从维护模式中删除一个节点，我们将使用 uncordon。

$ kubectl uncordon elliot-02

node/elliot-02 uncordoned

$ kubectl get nodes

NAME           STATUS   ROLES    AGE     VERSION
elliot-01   Ready    master   7d14h   v1.18.2
elliot-02   Ready    <none>   7d14h   v1.18.2
elliot-03   Ready    <none>   7d14h   v1.18.2

根据标签选择节点

节点选择器是对我们的节点进行分类的方法，比如我们的节点 elliot-02 有一个 SSD 磁盘，位于 DataCenter 英国，而节点 elliot-03 有一个 HDD 磁盘，位于 DataCenter 荷兰。现在我们有了这些信息，我们将在节点中创建这些标签，使用 nodeSelector。

$ kubectl label node elliot-02 disk=SSD
$ kubectl label node elliot-02 dc=UK

$ kubectl label node elliot-03 dc=Netherlands
$ kubectl label nodes elliot-03 disk=hdd
$ kubectl label nodes elliot-03 disk=HDD --overwrite

要知道每个节点上配置的标签，只需执行以下命令。

$ kubectl label nodes elliot-02 --list

dc=UK
disk=SSD
kubernetes.io/hostname=elliot-02
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux

$ kubectl label nodes elliot-03 --list

beta.kubernetes.io/os=linux
dc=Netherlands
disk=HDD
kubernetes.io/hostname=elliot-03
beta.kubernetes.io/arch=amd64

现在，只需再次执行部署，但首先我们将在 YAML 中添加两个新选项，我们将看到神奇的事情发生。我们的 pod 将在节点 elliot-02 中创建，其中有 disk=SSD 的标签。

apiVersion : apps / v1
kind : Deployment
metadata :
   labels :
     run : nginx
  name : third-deployment
  namespace : default
spec :
   replicas : 1
  selector :
     matchLabels :
       run : nginx
  template :
     metadata :
       creationTimestamp : null
      labels :
         run : nginx
        dc :Netherlands
    spec :
       containers :
      - image : nginx
        imagePullPolicy : Always
        name : nginx2
        ports :
        - containerPort : 80
          protocol : TCP
        resources : {}
        terminationMessagePath : / dev / termination-log
        terminationMessagePolicy : File
      dnsPolicy : ClusterFirst
      restartPolicy : Always
      schedulerName : default-scheduler
      securityContext : {}
      terminationGracePeriodSeconds : 30
      nodeSelector :
         disk : SSD

从清单中创建部署：

$ kubectl create -f terceiro-deployment.yaml

deployment.extensions/terceiro-deployment created

$ kubectl get pods -o wide

NAME                        READY STATUS  RESTARTS  AGE  IP           NODE
primeiro-deployment-56d9... 1/1   Running  0      14m  172.17.0.4 elliot-03
segundo-deployment-869f...  1/1   Running  0      14m  172.17.0.5 elliot-03
terceiro-deployment-59cd... 1/1   Running  0      22s  172.17.0.6 elliot-02

我们可以通过如下方式移除标签：

$ kubectl label nodes elliot-02 dc-
$ kubectl label nodes --all dc-

现在想象一下，这可以为你提供无限的可能性，比如它是否是生产型的，是否消耗大量的 CPU 或大量的内存，是否需要放在某个机架上等等。

最近更新于 0001-01-01