#1402  Support other Kubernetes CRIs like containerd
Closed
artemis opened 11 months ago

First of all: Thx for that awesome project. I've been looking a long time for a VCS like OneDev.

It would be really helpful if it were possible to add compatibility for other Kubernetes CRIs besides Docker, like containerd or cri-o. ATM I am limited to normal build because it is not possible for the Kubernetes Executor to mount unix:///var/run/docker.sock (in my case it would need something like unix:///run/containerd/containerd.sock o.s.). Would it be possible to make that configurable, or is this something that requires more significant changes?

artemis commented 11 months ago

I just took a look into the code, where I saw that, at least in theory, this is already implemented.

containerdSock = "/run/containerd/containerd.sock";

Should I change the Issue Type to Bug?

Robin Shen commented 11 months ago

@artemis thanks for using OneDev. I did test with containerd before on k8s and it works. Maybe broken for some reason. Will look into this.

Robin Shen changed fields 11 months ago
Name Previous Value Current Value
Type
New Feature
Bug
Affected Versions
empty
<=8.3.1
Robin Shen changed fields 11 months ago
Name Previous Value Current Value
Type
Bug
Support Request
Robin Shen commented 11 months ago

Just launched a GKE cluster with containerd runtime, and OneDev can uses its container runtime to build image. Note that you need to enable the option mount container sock in kubernetes executor for doing this. It is disabled by default for security reason.

artemis commented 11 months ago

Hi @robin thanks for the prompt reply and for looking into it. I really appreciate it. Where does the kubernetes executor get its information about the underlying CRI? When I deploy a pod to the same cluster that is explicitly mounting the /run/containerd/containerd.sock path, it is working fine, however the executor still tries to mount unix:///var/run/docker.sock instead (the mount option is activated)

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

socket-test.yaml

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: docker
    securityContext:
      privileged: true
    name: test-container
    volumeMounts:
    - mountPath: /run/containerd/containerd.sock 
      name: docker-sock-volume
  volumes:
  - name: docker-sock-volume
    hostPath:
      # location on host
      path: /run/containerd/containerd.sock
Robin Shen commented 11 months ago

The executor does not know which CRI is being used, and it mounts /var/run/docker.sock and /run/containerd/containerd.sock to respective location in container. If docker runtime is not available in the node, docker sock will certainly not work, but containerd should be useful as long as you operate it with containerd aware tools.

PS: The reason mounted docker sock works in GKE with containerd runtime in my previous test is that the node still has docker runtime installed, although it is using containerd runtime

artemis commented 11 months ago

The executor does not know which CRI is being used, and it mounts /var/run/docker.sock and /run/containerd/containerd.sock to respective location in container. If docker runtime is not available in the node, docker sock will certainly not work, but containerd should be useful as long as you operate it with containerd aware tools.

PS: The reason mounted docker sock works in GKE with containerd runtime in my previous test is that the node still has docker runtime installed, although it is using containerd runtime

That does explain a lot 😃. I guess I can work with that.

artemis commented 11 months ago

If I add a registry login to the Kubernetes executor, I don't get theunix:///var/run/docker.sock not found message. Now, however, the Build Docker Image shows me:

Get "https://registry-1.docker.io/v2/": unauthorized: incorrect username or password
/onedev-build/command/step-2.sh: 4: exit: Illegal number: /b

When I try to add https://registry-1.docker.io/v2/ to the registry logins, I see a Login Succeeded message but the same message from above right below, which makes the pipeline fail again.

Is the Build Docker Image not usable at all without the docker socket/runtime on the node? Is there a convenient alternative? I saw in the code that the k8s-helper container indeed has nerdctl installed. Would it be possible to use this cli instead?

artemis commented 11 months ago

Hi @robin thanks for the prompt reply and for looking into it. I really appreciate it. Where does the kubernetes executor get its information about the underlying CRI? When I deploy a pod to the same cluster that is explicitly mounting the /run/containerd/containerd.sock path, it is working fine, however the executor still tries to mount unix:///var/run/docker.sock instead (the mount option is activated)

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

socket-test.yaml

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: docker
    securityContext:
      privileged: true
    name: test-container
    volumeMounts:
    - mountPath: /run/containerd/containerd.sock 
      name: docker-sock-volume
  volumes:
  - name: docker-sock-volume
    hostPath:
      # location on host
      path: /run/containerd/containerd.sock

I did some more research and opened a shell to the Pod (see quote):

  • docker version
Client:
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:50:49 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:35:04 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.14
  GitCommit:        9ba4b250366a5ddde94bb7c9d1def331423aa323
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • docker login
Login Succeeded
  • docker build -t ubuntu . Dockerfile
FROM docker-proxy/ubuntu:latest
RUN apt update -y && apt upgrade -y

stdout

=> [internal] load .dockerignore                                                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                                                    0.0s
 => [internal] load build definition from Dockerfile                                                                                                                               0.0s
 => => transferring dockerfile: 109B                                                                                                                                               0.0s
 => [internal] load metadata for docker-proxy/ubuntu:latest                                                                                                                    0.0s
 => [auth] sharing credentials for docker-proxy                                                                                                                  0.0s
 => CACHED [1/2] FROM docker-proxy/ubuntu:latest@sha256:ac58ff7fe25edc58bdf0067ca99df00014dbd032e2246d30a722fa348fd799a5                                                       0.0s
 => [2/2] RUN apt update -y && apt upgrade -y                                                                                                                                      6.9s
 => exporting to image                                                                                                                                                             0.2s
 => => exporting layers                                                                                                                                                            0.2s
 => => writing image sha256:dba20052e438e1ad2c957824e8263470e02c09acd44e5aae9bad099019ebf6f9                                                                                       0.0s
 => => naming to docker.io/library/ubuntu

So, it seems that docker-cli and containerd are working together correct by "default".

Robin Shen commented 11 months ago

How did you specify tag of the image being built in OneDev?

artemis commented 11 months ago

@robin like this private-registry.local/some-image-name:latest

Robin Shen commented 11 months ago

Then url of registry login in k8s executor should be specified as private-registry.local

Robin Shen commented 11 months ago

Please make sure below works on a linux host with docker installed:

docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock 1dev/k8s-helper-linux:2.10.11 sh
docker login <your private registry url>
docker push <your private registry url>/yourrepo/yourimage:latest
artemis commented 11 months ago

Please make sure below works on a linux host with docker installed:

docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock 1dev/k8s-helper-linux:2.10.11 sh
docker login <your private registry url>
docker push <your private registry url>/yourrepo/yourimage:latest

That does work without issues. I guess it also has something to do with the docker.sock being unavailable. I tried some more things and ended up changing the Kubernetes Executor code, removing the docker-socket references and set the image securityContext to privileged: true. Built and deployed onedev in a new namespace and started a command execution pipeline with dockerd </dev/null &>/dev/null & to start the docker daemon that connected directly to the unix:///run/containerd/containerd.sock socket. After that, everything works like a charm. I guess GKE, EKS etc. do something "special" to preserve the compatibility to mount the docker.sock directly and use the underlying containerd instead. It would be awesome if onedev would offer the possibility to configure that part in e.g. the Executor setting. @robin what do you think?

Robin Shen commented 11 months ago

Trying to reproduce your procedure on GKE. Dockerd starts successfully and listen on /var/run/docker.sock, but a subsequent attempt to run hello-world image fails.

11:17:22 Checking cluster access...
11:17:24 Preparing job (executor: executor, namespace: executor-7-11-0)...
11:17:28 Running job on node gke-mycluster-default-pool-7db4e371-32n2...
11:17:28 Starting job containers...
11:17:30 Retrieving job data from https://87de-58-41-5-137.ngrok-free.app...
11:17:30 Setting up job cache...
11:17:30 Generating command scripts...
11:17:30 Downloading job dependencies from https://87de-58-41-5-137.ngrok-free.app...
11:17:31 Job workspace initialized
11:17:35 Running step "test"...
11:17:35 time="2023-06-04T03:17:32.987646774Z" level=info msg="Starting up"
11:17:35 time="2023-06-04T03:17:32.989691663Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
11:17:35 time="2023-06-04T03:17:33.029670851Z" level=info msg="Loading containers: start."
11:17:35 time="2023-06-04T03:17:33.161559688Z" level=info msg="Loading containers: done."
11:17:35 time="2023-06-04T03:17:33.176312534Z" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
11:17:35 time="2023-06-04T03:17:33.176747913Z" level=info msg="Daemon has completed initialization"
11:17:35 time="2023-06-04T03:17:33.215510513Z" level=info msg="API listen on /var/run/docker.sock"
11:17:47 Unable to find image 'hello-world:latest' locally
11:17:48 latest: Pulling from library/hello-world
11:17:48 719385e32844: Pulling fs layer
11:17:48 719385e32844: Verifying Checksum
11:17:48 719385e32844: Download complete
11:17:48 719385e32844: Pull complete
11:17:48 Digest: sha256:fc6cf906cbfa013e80938cdf0bb199fbdbb86d6e3e013783e5a766f50f5dbce0
11:17:48 Status: Downloaded newer image for hello-world:latest
11:17:48 time="2023-06-04T03:17:48.862499393Z" level=error msg="stream copy error: reading from a closed fifo"
11:17:48 time="2023-06-04T03:17:48.864466844Z" level=error msg="stream copy error: reading from a closed fifo"
11:17:48 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: invalid rootfs: stat /var/lib/docker/overlay2/4ecdbb27cdafdc185131bd112ef7a648f1f5b3ddd8bd94a470acc3aa74b1bd1c/merged: no such file or directory: unknown.
11:17:48 time="2023-06-04T03:17:48Z" level=error msg="error waiting for container: "
11:17:51 Step "test is failed: Command failed with exit code 127

Seems that this is fragile and highly depends on the node enviroment. I'd suggest to use kaniko to build docker image inside k8s, and will add a step for it.

artemis commented 11 months ago

@robin ya I guess that it strongly depends on the respective environment. Kaniko would be a sufficient solution I guess. Anything I can do to help?

Robin Shen commented 11 months ago

It should be a rather easy step to run Kaniko container. Will get it into next patch release.

Robin Shen commented 11 months ago

I filed an improvment request for this as #1406

artemis commented 11 months ago

It should be a rather easy step to run Kaniko container. Will get it into next patch release.

Sounds great. Really appreciate your efforts. Thx

Robin Shen changed state to 'Closed' 11 months ago
Previous Value Current Value
Open
Closed
Robin Shen commented 11 months ago

Kaniko image build step has been added in 8.3.4. Closing this now.

artemis referenced from other issue 11 months ago
issue 1 of 1
Type
Question
Priority
Major
Assignee
Issue Votes (0)
Watchers (4)
Reference
onedev/server#1402
Please wait...
Page is in error, reload to recover