Improvement Request: Add toleration on kubernetes job executor (OD-892)
matias blanco opened 3 years ago

Hi,

I thinks that's will be very useful the use of toleration when we configure the Kubernetes job executor, in my case, we have 1 node_pool with all services self-hosted, and another node_pool that are preemptible, so we configure taints in that node_pool to evict that service pods are scheduled in it.

i this case i would like to run the pipelines jobs in the preemptible node_pool, which are more powerful and cheapest than the others.

  • matias blanco changed title 3 years ago
    Previous Value Current Value
    Improvment Request: Add toleration on kubernetes job executor
    Improvement Request: Add toleration on kubernetes job executor
  • Robin Shen commented 3 years ago

    How about set up the job executor to only match the preemptible node pool and instruct pipeline jobs to only use that job executor?

  • Zhou You commented 3 years ago

    use nodeSelector or tolerations can be realized, We can specify a set of key-values in pipeline to render job pods.

    apiVersion: v1
    kind: Pod
    metadata:
      name: with-node-affinity
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - antarctica-east1
                - antarctica-west1
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: another-node-label-key
                operator: In
                values:
                - another-node-label-value
      containers:
      - name: with-node-affinity
        image: registry.k8s.io/pause:2.0
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      labels:
        env: test
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
      tolerations:
      - key: "example-key"
        operator: "Exists"
        effect: "PreferNoSchedule"
    

    nodeselector taint-and-toleration

  • Robin Shen commented 3 years ago

    Adding this is possible. However if the problem itself can be solved with existing feature, I'd prefer not to add extra settings. Also job definition is common to all executors (k8s, docker, shell etc), and I am trying to avoid adding executor specific settings there.

  • matias blanco commented 3 years ago

    Hi guys,

    yes, i can use node_selector, it's not ideal but can work, but i see another problem, in this case the preemptible node_pool can scale to 0, because it's used to run only scheduled workflow, but if a build it's running, and the node_pool hasn`t node ready, the build fail.

    I would expect the build can wait a maximum of 5 minutes before a node can handle jobs, in GCP the node autoscaling add nodes to a node_pool in 1 minute

    imagen_2.png

  • Robin Shen changed state to 'Closed' 3 years ago
    Previous Value Current Value
    Open
    Closed
  • Robin Shen commented 3 years ago

    Please upgrade to build #2937 which will wait for k8s to scale up nodes before failing the build. I also close this issue. Feel free to reopen if you feel toleration is necessary later.

  • matias blanco commented 3 years ago

    Hi Robin,

    Thank you very much, I update the docker image, and now runs perfect,

    about the toleration, I think that should be in the jobs executor configuration because it's not related to specific jobs, instead, how builds pods are deployed to the selected node.

    and the toleration it's very useful, because prevent that other pods (include system pods that aren't daemonset) are sheduled in it, so it can scale to 0 if it isn't in use.

    this example that comment @zzzhouuu it's that should do the pod tolerate to the taint

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      labels:
        env: test
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
      tolerations:
      - key: pipelines-jobs-executor
        operator: Equal
        value: 'true'
        effect: "NoSchedule"
    

    I take this opportunity to congratulate all those who contribute to this project!

  • Robin Shen commented 3 years ago

    Adding toleration is not complicated. But just want to keep things simple...

    In case other pods should not be scheduled to certain node pool. Just define another k8s executor and configure those jobs to use that executor instead.

    Let me know if this approach works at your side.

  • Robin Shen commented 3 years ago

    Also even if toleration is added to executor, you will need to define multiple executors to use different tolerations if different jobs need to use different tolerations.

issue 1/1
Type
Improvement
Priority
Normal
Assignee
Issue Votes (0)
Watchers (4)
Reference
OD-892
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover