Install Scheduler-plugins #

Table of Contents #

Create a Kubernetes Cluster #

Firstly you need to have a Kubernetes cluster, and a kubectl command-line tool must be configured to communicate with the cluster.

The Kubernetes version must equal to or greater than v1.23.0. To check the version, use kubectl version --short.

If you do not have a cluster yet, create one by using one of the following provision tools:

Install release v0.31.8 and use Coscheduling #

Note: we provide two ways to install the scheduler-plugin artifacts: as a second scheduler and as a single scheduler. Their pros and cons are as below:

second scheduler:
- pro: it’s easy to install by deploying the Helm chart
- con: running multi-scheduler will inevitably encounter resource conflicts when the cluster is short of resources.
  
  Consider the scenario where multiple schedulers attempt to assign their pods simultaneously to a node which can only fit one of the pods. The pod that arrives later will be evicted by the kubelet, and hang there (without its .spec.nodeName cleared) until resources get released on the node.
  
  Running multiple schedulers, therefore, is not recommended in the production env. However, it’s a good starting point to play with scheduler framework and exercise plugin development, no matter you’re on managed or on-premise Kubernetes clusters.
single scheduler:
- pro: you will be using a unified scheduler and hence keep the resources conflict-free. It’s recommended for the production env.
- con: you have to have the privileges to manipulate the control plane, and at this moment, the installation is not fully automated (no Helm chart yet).

As a second scheduler #

The quickest way to try scheduler-plugins is to install it using helm chart as a second scheduler. You can find the demo chart in manifests/install/charts. But if in the production environment, it is recommended to replace the default-scheduler manually(as described in next section).

Install using Helm Chart

As a single scheduler (replacing the vanilla default-scheduler) #

A bit different from the automatic installation steps above, using scheduler-plugins as a single scheduler needs some manual steps.

The main obstacle here is that we need to reconfigure the vanilla scheduler, but it’s challenging to get it automated as how it’s deployed varies a lot (i.e., deployment, static pod, or an executable binary managed by systemd). Moreover, managed Kubernetes offerings may be cluster-specific that need extra configuration and hence hard to be pipelined nicely.

In this section, we will walk you through how to replace the default scheduler with the scheduler-plugins image. As the new image is built on top of the default scheduler, you won’t lose any vanilla Kubernetes scheduling capability. Instead, a lot of extra out-of-box functionalities (implemented by the plugins in this repo) can be obtained, such as coscheduling.

The following steps are based on a Kubernetes cluster created by Kind.

Log into the control plane node

sudo docker exec -it $(sudo docker ps | grep control-plane | awk '{print $1}') bash

Backup kube-scheduler.yaml

cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/kube-scheduler.yaml

Create /etc/kubernetes/sched-cc.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
leaderElection:
  # (Optional) Change true to false if you are not running a HA control-plane.
  leaderElect: true
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
- schedulerName: default-scheduler
  plugins:
    multiPoint:
      enabled:
      - name: Coscheduling
      disabled:
      - name: PrioritySort

❗IMPORTANT❗ Starting with release v0.19, several plugins (e.g., coscheduling) introduced CRD to optimize their design and implementation. And hence we need an extra step to:
- apply extra RBAC privileges to user system:kube-scheduler so that the scheduler binary is able to manipulate the custom resource objects
- install a controller binary managing the custom resource objects
Next, we apply the compiled yaml located at manifests/install/all-in-one.yaml.
```
$ kubectl apply -f all-in-one.yaml
```
After this step, a deployment called scheduler-plugins-controller is expected to run in namespace scheduler-plugins:
```
$ kubectl get deploy -n scheduler-plugins
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
scheduler-plugins-controller   1/1     1            1           19h
```
❗IMPORTANT❗ Install the CRDs your workloads depend on.

You can refer to each folder under manifests/crds to obtain the CRD yaml for each plugin. Here we install coscheduling CRD:
```
$ kubectl apply -f manifests/crds/scheduling.x-k8s.io_podgroups.yaml
```

Modify /etc/kubernetes/manifests/kube-scheduler.yaml to run scheduler-plugins with coscheduling

Generally, we need to make a couple of changes:

pass in the composed scheduler-config file via argument --config
(optional) remove duplicated CLI parameters (e.g., --leader-elect), as they may have been defined in the config file
replace vanilla Kubernetes scheduler image with scheduler-plugin image
mount the scheduler-config file to be readable when scheduler starting

Here is a diff:

16d15
<     - --config=/etc/kubernetes/sched-cc.yaml
17a17,18
>     - --kubeconfig=/etc/kubernetes/scheduler.conf
>     - --leader-elect=true
19,20c20
<     image: registry.k8s.io/scheduler-plugins/kube-scheduler:v0.31.8
---
>     image: registry.k8s.io/kube-scheduler:v1.28.9
50,52d49
<     - mountPath: /etc/kubernetes/sched-cc.yaml
<       name: sched-cc
<       readOnly: true
60,63d56
<   - hostPath:
<       path: /etc/kubernetes/sched-cc.yaml
<       type: FileOrCreate
<     name: sched-cc

Verify that kube-scheduler pod is running properly with a correct image: registry.k8s.io/scheduler-plugins/kube-scheduler:v0.31.8

$ kubectl get pod -n kube-system | grep kube-scheduler
kube-scheduler-kind-control-plane            1/1     Running   0          3m27s

$ kubectl get pods -l component=kube-scheduler -n kube-system -o=jsonpath="{.items[0].spec.containers[0].image}{'\n'}"
registry.k8s.io/scheduler-plugins/kube-scheduler:v0.31.8

⚠️Troubleshooting: If the kube-scheudler is not up, you may need to restart kubelet service inside the kind control plane (systemctl restart kubelet.service)

Test Coscheduling #

Now, we’re able to verify how the coscheduling plugin works.

Create a PodGroup custom object called pg1:

# podgroup.yaml
apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: pg1
spec:
  scheduleTimeoutSeconds: 10
  minMember: 3

$ kubectl apply -f podgroup.yaml

Create a deployment labelled scheduling.x-k8s.io/pod-group: pg1 to associated with PodGroup pg1 created in the previous step.

# deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pause
  template:
    metadata:
      labels:
        app: pause
        scheduling.x-k8s.io/pod-group: pg1
    spec:
      containers:
      - name: pause
        image: registry.k8s.io/pause:3.6

⚠️Note:️ If you are running scheduler-plugins as a second scheduler, you should explicitly specify .spec.schedulerName to match the secondary scheduler name:
# deploy.yaml
...
spec:
  ...
  template:
    spec:
      schedulerName: scheduler-plugins-scheduler

As PodGroup pg1 requires at least 3 pods to be scheduled all-together, and there are only 2 Pods so far, so it’s expected to observer they are pending:

All nginx pods are expected to be Pending as they cannot be co-scheduled altogether.
```
$ kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
pause-646dbcfb64-4zvt6   0/1     Pending   0          9s
pause-646dbcfb64-8kpg4   0/1     Pending   0          9s
```

Now let’s scale the deployment up to have 3 replicas, so as to qualify for minMember (i.e., 3) of the associated PodGroup:

$ kubectl scale deploy pause --replicas=3
deployment.apps/pause scaled

And wait for a couple of seconds, it’s expected to see all Pods get into running state:

$ kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
pause-646dbcfb64-4zvt6   1/1     Running   0          42s
pause-646dbcfb64-8kpg4   1/1     Running   0          42s
pause-646dbcfb64-npzcf   1/1     Running   0          8s

You can also get the PodGroup’s spec via:

$ kubectl get podgroup pg1 -o yaml
apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"scheduling.x-k8s.io/v1alpha1","kind":"PodGroup","metadata":{"annotations":{},"name":"pg1","namespace":"default"},"spec":{"minMember":3,"scheduleTimeoutSeconds":10}}
  creationTimestamp: "2022-02-08T19:55:24Z"
  generation: 8
  name: pg1
  namespace: default
  resourceVersion: "6142"
  uid: b4ac3562-54ab-4c1e-89bb-541a81c6acce
spec:
  minMember: 3
  scheduleTimeoutSeconds: 10
status:
  phase: Running
  running: 3
  scheduleStartTime: "2022-02-08T19:55:24Z"
  scheduled: 3

⚠ NOTE: There are some UX issues need to be addressed in controller side - #166.

Install old-version releases #

If you’re running at v0.18.9, which doesn’t depend on PodGroup CRD, you should refer to the install doc in branch release-1.18 for detailed installation instructions.

Uninstall scheduler-plugins #

Delete the deployment

$ kubectl delete deploy pause -n default

Recover kube-scheduler.yaml and delete sched-cc.yaml

If the cluster is created by kubeadm or minikube, log into Master node:

$ mv /etc/kubernetes/kube-scheduler.yaml /etc/kubernetes/manifests/
$ rm /etc/kubernetes/sched-cc.yaml

If the cluster is created by kind, enter the Master’s container:

$ sudo docker exec -it $(sudo docker ps | grep control-plane | awk '{print $1}') bash
$ mv /etc/kubernetes/kube-scheduler.yaml /etc/kubernetes/manifests/
$ rm /etc/kubernetes/sched-cc.yaml
exit

Check state of default scheduler

$ kubectl get pod -n kube-system | grep kube-scheduler
kube-scheduler-kind-control-plane            1/1     Running   0          91s