onedev/server

#532 Job Executor: server docker executor failed to run jobs on docker swarm

Released

jbauer opened 2 years ago

We have docker-compose.yml file which defines all our server side development services. That compose file is then used with docker stack deploy -c docker-compose.yml dev-environment so that all services are deployed to docker swarm.

services:

  onedev:
    hostname: "onedev"
    image: 1dev/server
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    volumes:
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
      - type: volume
        source: onedev
        target: /opt/onedev
    networks:
      - dev
    ports:
      - target: 6610
        published: 6610
        protocol: tcp
        mode: ingress
      - target: 6611
        published: 6611
        protocol: tcp
        mode: ingress

Once we run the above via docker stack deploy we get a functional onedev instance in docker swarm. However configuring docker as job executor fails with exception:

17:02:21No job executor defined, auto-discovering...
17:02:21Checking if there is a Kubernetes cluster...
17:02:21Checking if there is docker facility...
17:02:22Discovered job executor type: Server Docker Executor
17:02:22Waiting for resources...
17:02:22Executing job (executor: auto-discovered, network: auto-discovered-1-1-0)...
17:02:22Allocating job caches...
17:02:22Copying job dependencies...
17:02:22Running step "helloworld"...
17:02:23Error running job: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Unable to get container information
    	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    	at io.onedev.server.buildspec.job.JobExecution.check(JobExecution.java:52)
    	at io.onedev.server.buildspec.job.DefaultJobManager$10.run(DefaultJobManager.java:1188)
    	at io.onedev.server.persistence.DefaultTransactionManager$2.call(DefaultTransactionManager.java:103)
    	at io.onedev.server.persistence.DefaultTransactionManager$2.call(DefaultTransactionManager.java:99)
    	at io.onedev.server.persistence.DefaultTransactionManager$1.call(DefaultTransactionManager.java:72)
    	at io.onedev.server.persistence.DefaultSessionManager.call(DefaultSessionManager.java:79)
    	at io.onedev.server.persistence.DefaultTransactionManager.call(DefaultTransactionManager.java:60)
    	at io.onedev.server.persistence.DefaultTransactionManager.run(DefaultTransactionManager.java:99)
    	at io.onedev.server.buildspec.job.DefaultJobManager.run(DefaultJobManager.java:1137)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: Unable to get container information
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor.getOuterPath(ServerDockerExecutor.java:525)
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor.access$300(ServerDockerExecutor.java:93)
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor$1$2.runStepContainer(ServerDockerExecutor.java:230)
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor$1$2.execute(ServerDockerExecutor.java:316)
    	at io.onedev.k8shelper.LeafFacade.execute(LeafFacade.java:11)
    	at io.onedev.k8shelper.CompositeFacade.execute(CompositeFacade.java:34)
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor$1.run(ServerDockerExecutor.java:219)
    	at io.onedev.server.job.resource.DefaultResourceManager.run(DefaultResourceManager.java:148)
    	at io.onedev.server.plugin.executor.serverdocker.ServerDockerExecutor.execute(ServerDockerExecutor.java:163)
    	at io.onedev.server.buildspec.job.DefaultJobManager$4.run(DefaultJobManager.java:704)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at io.onedev.server.security.SecurityUtils$1.run(SecurityUtils.java:338)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	... 1 more

Looking briefly into the code I think the reason is that you are trying to inspect the container onedev is running in. To do so you either use the hostname (which is onedev in our case) or you are trying onedev as a fallback. Given that we deployed onedev to docker swarm as a service there is no such container.

Calling docker ps within the onedev container results in

root@onedev:/# docker ps
CONTAINER ID   IMAGE                                       COMMAND                  CREATED          STATUS          PORTS                                              NAMES
252d42474124   1dev/server:latest                          "/root/bin/entrypoin…"   49 minutes ago   Up 49 minutes   6610-6611/tcp                                      dev-environment_onedev.1.gk2giempre4u9xq1q5oegyydc

So you would need to call docker container inspect dev-environment_onedev.1.gk2giempre4u9xq1q5oegyydc or docker container inspect 252d42474124. But given that we have defined a custom hostname and given that you do not know about the docker stack name (dev-environment) both commands can not be generated with the available information.

As a workaround we could comment the hostname in the docker-compse file, so that the container hostname will be its container id again. However we like to have a more stable hostname.

So I think it would be useful to have an environment variable in the onedev docker image to give onedev more information:

Provide ONEDEV_SERVICE_NAME: You could execute docker service inspect <ONEDEV_SERVICE_NAME>, find the mounts, check wether or not it is a bind mount (done) or a volume mount. If it is a volume mount you need to follow up with docker volume inspect <VOLUME_NAME> to find out the current mountpoint of that volume on the host.
Provide ONEDEV_VOLUME_NAME and ONEDEV_HOST_BIND_DIR: If the volume name is provided you again call docker volume inspect <ONEDEV_VOLUME_NAME>. If a bind mount is used, you simply use the provided path directly.

Maybe there are other solutions but the current solution in code is a little too easy to be fully compatible with docker swarm.

Robin Shen commented 2 years ago

This seems too complicated. I am thinking of adding an environment variable so that user can pass in onedev installation directory on host directly. Will that work for your case?

If that environment variable is not set, onedev continues with current logic to find the host installation directory which works for most cases.

jbauer commented 2 years ago

That would only work if you assume that the data directory of docker daemon is the same on all hosts in a docker swarm cluster. However that might not be the case. Different linux distributions might install docker into different locations. Or a node is intentionally configured differently by changing dockerd's configuration parameter data-root, see: https://docs.docker.com/engine/reference/commandline/dockerd/

Can you shortly describe how you use the onedev host installation directory when using job executors? So I can better understand the situation.

Robin Shen commented 2 years ago

When running a CI job using server docker executor, OneDev creates a temp directory under its installation directory, which is /opt/onedev/temp/jobuuid, cloning repository into this temp directory, and then mount this temp directory into job container as /onedev-build to serve as job workspace. Since OneDev running in container uses the option -v /var/run/docker.sock:/var/run/docker.sock to use the DooD approach to avoid DinD headaches, volume mount of -v /opt/onedev/temp/jobuuid:/onedev-build when creating job container will actually be handled by the docker daemon on the host running OneDev container, and the mount will fail, as the path /opt/onedev/temp/jobuuid does not exist in host. To solve the issue, OneDev has to find out the host path mounting as /opt/onedev in OneDev container, and from that host path, OneDev deducts the actual host path corresponding to /opt/onedev/temp/jobuuid, and then use that path as mount source when creating job container.

Robin Shen commented 2 years ago

Turns out that below command can be used to determine container id of OneDev:

docker ps -f volume=/opt/onedev

With the container id, it is easy to determine the host installation directory simply by inspecting the container, even if the volume is not a bind one. Docker inspect will always output source directory of the mount like below in the volumes section:

"Type": "volume",
"Name": "onedev",
 "Source": "/var/lib/docker/volumes/onedev/_data",
 "Destination": "/opt/onedev",
 "Driver": "local",
 "Mode": "z",
 "RW": true,
 "Propagation": ""

Can you please verify if this works at your side?

OneDev changed state to 'Closed' 2 years ago

Previous Value	Current Value
Open	Closed

OneDev commented 2 years ago

State changed as code fixing the issue is committed

OneDev changed state to 'Released' 2 years ago

Previous Value	Current Value
Closed	Released

OneDev commented 2 years ago

State changed as build #2237 is successful

Robin Shen changed title 2 years ago

Previous Value	Current Value
Job Executor: Server Docker executor failed to create	Job Executor: server docker executor failed to run jobs on docker swarm

jbauer commented 2 years ago

Sorry for the late reply. Yes the above command works in a docker service task.

An alternative solution could be to remove the dependency on the host path completely by using a onedev-build volume.

For example you could execute the following commands inside the onedev container to populate a docker volume with the build data and then run the build as usual:

Preparation:

docker volume create onedev-build-20220101120000
docker create --name onedev-build-volume-helper -v onedev-build-20220101120000:/onedev-build busybox
docker cp /opt/onedev/temp/jobuuid onedev-build-volume-helper:/onedev-build
docker rm onedev-build-volume-helper

After these commands you have a volume with the required data to execute the build. That volume could then be mounted into the real target image that the user has defined for the build, just like you already do it.

Run the docker job

docker run -v onedev-build-20220101120000:/onedev-build ......

That way you don't have to know the host path of onedev at all.

jbauer commented 2 years ago

Hmm on the other hand using the volume strategy also means occupying additional space on the host and maybe that host only has a small local disk and uses a NAS/SAN for storage. So using docker ps -f volume=/opt/onedev is the easier solution.

Robin Shen commented 2 years ago

Thanks for the idea. Just as you've mentioned, placing source directory of /onedev-build under OneDev's data directory can control the disk allocation easier.

Also there are two other reasons:

Using bind mount saves an extra copy for cloned git repositories
Cache directory mounting. Under the directory identified by cache key, OneDev has to determine which cache is not occupied, and then mount the free cache sub directory to container.

All these are much easier to be handled with bind mount knowing host directory of OneDev.

Type	Bug
Priority	Normal
Assignee	Robin Shen
Affected Versions	Not Found

Issue Votes (0)

Watchers (3)

Reference

onedev/server#532