Support other Kubernetes CRIs like containerd (OD-1402)
artemis opened 3 years ago

First of all: Thx for that awesome project. I've been looking a long time for a VCS like OneDev.

It would be really helpful if it were possible to add compatibility for other Kubernetes CRIs besides Docker, like containerd or cri-o. ATM I am limited to normal build because it is not possible for the Kubernetes Executor to mount unix:///var/run/docker.sock (in my case it would need something like unix:///run/containerd/containerd.sock o.s.). Would it be possible to make that configurable, or is this something that requires more significant changes?

  • artemis commented 3 years ago

    I just took a look into the code, where I saw that, at least in theory, this is already implemented.

    containerdSock = "/run/containerd/containerd.sock";
    

    Should I change the Issue Type to Bug?

  • Robin Shen commented 3 years ago

    @artemis thanks for using OneDev. I did test with containerd before on k8s and it works. Maybe broken for some reason. Will look into this.

  • Robin Shen changed fields 3 years ago
    Name Previous Value Current Value
    Type
    New Feature
    Bug
    Affected Versions
    empty
    <=8.3.1
  • Robin Shen changed fields 3 years ago
    Name Previous Value Current Value
    Type
    Bug
    Support Request
  • Robin Shen commented 3 years ago

    Just launched a GKE cluster with containerd runtime, and OneDev can uses its container runtime to build image. Note that you need to enable the option mount container sock in kubernetes executor for doing this. It is disabled by default for security reason.

  • artemis commented 3 years ago

    Hi @robin thanks for the prompt reply and for looking into it. I really appreciate it. Where does the kubernetes executor get its information about the underlying CRI? When I deploy a pod to the same cluster that is explicitly mounting the /run/containerd/containerd.sock path, it is working fine, however the executor still tries to mount unix:///var/run/docker.sock instead (the mount option is activated)

    Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    

    socket-test.yaml

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pd
    spec:
      containers:
      - image: docker
        securityContext:
          privileged: true
        name: test-container
        volumeMounts:
        - mountPath: /run/containerd/containerd.sock 
          name: docker-sock-volume
      volumes:
      - name: docker-sock-volume
        hostPath:
          # location on host
          path: /run/containerd/containerd.sock
    
  • Robin Shen commented 3 years ago

    The executor does not know which CRI is being used, and it mounts /var/run/docker.sock and /run/containerd/containerd.sock to respective location in container. If docker runtime is not available in the node, docker sock will certainly not work, but containerd should be useful as long as you operate it with containerd aware tools.

    PS: The reason mounted docker sock works in GKE with containerd runtime in my previous test is that the node still has docker runtime installed, although it is using containerd runtime

  • artemis commented 3 years ago

    The executor does not know which CRI is being used, and it mounts /var/run/docker.sock and /run/containerd/containerd.sock to respective location in container. If docker runtime is not available in the node, docker sock will certainly not work, but containerd should be useful as long as you operate it with containerd aware tools.

    PS: The reason mounted docker sock works in GKE with containerd runtime in my previous test is that the node still has docker runtime installed, although it is using containerd runtime

    That does explain a lot 😃. I guess I can work with that.

  • artemis commented 3 years ago

    If I add a registry login to the Kubernetes executor, I don't get theunix:///var/run/docker.sock not found message. Now, however, the Build Docker Image shows me:

    Get "https://registry-1.docker.io/v2/": unauthorized: incorrect username or password
    /onedev-build/command/step-2.sh: 4: exit: Illegal number: /b
    

    When I try to add https://registry-1.docker.io/v2/ to the registry logins, I see a Login Succeeded message but the same message from above right below, which makes the pipeline fail again.

    Is the Build Docker Image not usable at all without the docker socket/runtime on the node? Is there a convenient alternative? I saw in the code that the k8s-helper container indeed has nerdctl installed. Would it be possible to use this cli instead?

  • artemis commented 3 years ago

    Hi @robin thanks for the prompt reply and for looking into it. I really appreciate it. Where does the kubernetes executor get its information about the underlying CRI? When I deploy a pod to the same cluster that is explicitly mounting the /run/containerd/containerd.sock path, it is working fine, however the executor still tries to mount unix:///var/run/docker.sock instead (the mount option is activated)

    Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    

    socket-test.yaml

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pd
    spec:
      containers:
      - image: docker
        securityContext:
          privileged: true
        name: test-container
        volumeMounts:
        - mountPath: /run/containerd/containerd.sock 
          name: docker-sock-volume
      volumes:
      - name: docker-sock-volume
        hostPath:
          # location on host
          path: /run/containerd/containerd.sock
    

    I did some more research and opened a shell to the Pod (see quote):

    • docker version
    Client:
     Version:           24.0.2
     API version:       1.43
     Go version:        go1.20.4
     Git commit:        cb74dfc
     Built:             Thu May 25 21:50:49 2023
     OS/Arch:           linux/amd64
     Context:           default
    
    Server: Docker Engine - Community
     Engine:
      Version:          24.0.2
      API version:      1.43 (minimum version 1.12)
      Go version:       go1.20.4
      Git commit:       659604f
      Built:            Thu May 25 21:35:04 2023
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.6.14
      GitCommit:        9ba4b250366a5ddde94bb7c9d1def331423aa323
     runc:
      Version:          1.1.7
      GitCommit:        v1.1.7-0-g860f061
     docker-init:
      Version:          0.19.0
      GitCommit:        de40ad0
    
    • docker login
    Login Succeeded
    
    • docker build -t ubuntu . Dockerfile
    FROM docker-proxy/ubuntu:latest
    RUN apt update -y && apt upgrade -y
    

    stdout

    => [internal] load .dockerignore                                                                                                                                                  0.0s
     => => transferring context: 2B                                                                                                                                                    0.0s
     => [internal] load build definition from Dockerfile                                                                                                                               0.0s
     => => transferring dockerfile: 109B                                                                                                                                               0.0s
     => [internal] load metadata for docker-proxy/ubuntu:latest                                                                                                                    0.0s
     => [auth] sharing credentials for docker-proxy                                                                                                                  0.0s
     => CACHED [1/2] FROM docker-proxy/ubuntu:latest@sha256:ac58ff7fe25edc58bdf0067ca99df00014dbd032e2246d30a722fa348fd799a5                                                       0.0s
     => [2/2] RUN apt update -y && apt upgrade -y                                                                                                                                      6.9s
     => exporting to image                                                                                                                                                             0.2s
     => => exporting layers                                                                                                                                                            0.2s
     => => writing image sha256:dba20052e438e1ad2c957824e8263470e02c09acd44e5aae9bad099019ebf6f9                                                                                       0.0s
     => => naming to docker.io/library/ubuntu
    

    So, it seems that docker-cli and containerd are working together correct by "default".

  • Robin Shen commented 3 years ago

    How did you specify tag of the image being built in OneDev?

  • artemis commented 3 years ago

    @robin like this private-registry.local/some-image-name:latest

  • Robin Shen commented 3 years ago

    Then url of registry login in k8s executor should be specified as private-registry.local

  • Robin Shen commented 3 years ago

    Please make sure below works on a linux host with docker installed:

    docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock 1dev/k8s-helper-linux:2.10.11 sh
    docker login <your private registry url>
    docker push <your private registry url>/yourrepo/yourimage:latest
    
  • artemis commented 3 years ago

    Please make sure below works on a linux host with docker installed:

    docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock 1dev/k8s-helper-linux:2.10.11 sh
    docker login <your private registry url>
    docker push <your private registry url>/yourrepo/yourimage:latest
    

    That does work without issues. I guess it also has something to do with the docker.sock being unavailable. I tried some more things and ended up changing the Kubernetes Executor code, removing the docker-socket references and set the image securityContext to privileged: true. Built and deployed onedev in a new namespace and started a command execution pipeline with dockerd </dev/null &>/dev/null & to start the docker daemon that connected directly to the unix:///run/containerd/containerd.sock socket. After that, everything works like a charm. I guess GKE, EKS etc. do something "special" to preserve the compatibility to mount the docker.sock directly and use the underlying containerd instead. It would be awesome if onedev would offer the possibility to configure that part in e.g. the Executor setting. @robin what do you think?

  • Robin Shen commented 3 years ago

    Trying to reproduce your procedure on GKE. Dockerd starts successfully and listen on /var/run/docker.sock, but a subsequent attempt to run hello-world image fails.

    11:17:22 Checking cluster access...
    11:17:24 Preparing job (executor: executor, namespace: executor-7-11-0)...
    11:17:28 Running job on node gke-mycluster-default-pool-7db4e371-32n2...
    11:17:28 Starting job containers...
    11:17:30 Retrieving job data from https://87de-58-41-5-137.ngrok-free.app...
    11:17:30 Setting up job cache...
    11:17:30 Generating command scripts...
    11:17:30 Downloading job dependencies from https://87de-58-41-5-137.ngrok-free.app...
    11:17:31 Job workspace initialized
    11:17:35 Running step "test"...
    11:17:35 time="2023-06-04T03:17:32.987646774Z" level=info msg="Starting up"
    11:17:35 time="2023-06-04T03:17:32.989691663Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
    11:17:35 time="2023-06-04T03:17:33.029670851Z" level=info msg="Loading containers: start."
    11:17:35 time="2023-06-04T03:17:33.161559688Z" level=info msg="Loading containers: done."
    11:17:35 time="2023-06-04T03:17:33.176312534Z" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
    11:17:35 time="2023-06-04T03:17:33.176747913Z" level=info msg="Daemon has completed initialization"
    11:17:35 time="2023-06-04T03:17:33.215510513Z" level=info msg="API listen on /var/run/docker.sock"
    11:17:47 Unable to find image 'hello-world:latest' locally
    11:17:48 latest: Pulling from library/hello-world
    11:17:48 719385e32844: Pulling fs layer
    11:17:48 719385e32844: Verifying Checksum
    11:17:48 719385e32844: Download complete
    11:17:48 719385e32844: Pull complete
    11:17:48 Digest: sha256:fc6cf906cbfa013e80938cdf0bb199fbdbb86d6e3e013783e5a766f50f5dbce0
    11:17:48 Status: Downloaded newer image for hello-world:latest
    11:17:48 time="2023-06-04T03:17:48.862499393Z" level=error msg="stream copy error: reading from a closed fifo"
    11:17:48 time="2023-06-04T03:17:48.864466844Z" level=error msg="stream copy error: reading from a closed fifo"
    11:17:48 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: invalid rootfs: stat /var/lib/docker/overlay2/4ecdbb27cdafdc185131bd112ef7a648f1f5b3ddd8bd94a470acc3aa74b1bd1c/merged: no such file or directory: unknown.
    11:17:48 time="2023-06-04T03:17:48Z" level=error msg="error waiting for container: "
    11:17:51 Step "test is failed: Command failed with exit code 127
    

    Seems that this is fragile and highly depends on the node enviroment. I'd suggest to use kaniko to build docker image inside k8s, and will add a step for it.

  • artemis commented 3 years ago

    @robin ya I guess that it strongly depends on the respective environment. Kaniko would be a sufficient solution I guess. Anything I can do to help?

  • Robin Shen commented 3 years ago

    It should be a rather easy step to run Kaniko container. Will get it into next patch release.

  • Robin Shen commented 3 years ago

    I filed an improvment request for this as #1406

  • artemis commented 3 years ago

    It should be a rather easy step to run Kaniko container. Will get it into next patch release.

    Sounds great. Really appreciate your efforts. Thx

  • Robin Shen changed state to 'Closed' 3 years ago
    Previous Value Current Value
    Open
    Closed
  • Robin Shen commented 3 years ago

    Kaniko image build step has been added in 8.3.4. Closing this now.

  • artemis referenced from other issue 3 years ago
issue 1/1
Type
Question
Priority
Major
Assignee
Issue Votes (0)
Watchers (4)
Reference
OD-1402
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover