#351  Docker error: dial tcp: lookup production.cloudflare.docker.com on 127.0.1.1:53: read udp 127.0.0.1:33674->127.0.1.1:53: read: connection refused
Closed
Artur opened 3 years ago

This error above seems to happen when there are 2 or more builds running concurrently. From my investigation and internet information this most likely happens due to conflict to ports access by multiple builds.

When the failed build is rerun, while no other builds are running on the same server, the build is successful.

A possible solution to this could be as described here: https://stackoverflow.com/a/41071390/825587 running build Docker images using --bip parameter. This means the docker containers run in bridge mode and this allows to avoid port conflicts.

I was looking for a docker configuration on 1dev container to experiment with this settings but could not find any.

How can we adjust build containers parameters?

Robin Shen commented 3 years ago

Each build runs in a separate custom nat network and the port should not be conflicting due to multiple builds running. Nevertheless, you may modify the source to use custom subnet. The code creating network is here:

https://code.onedev.io/projects/onedev-server/blob/main/server-plugin/server-plugin-executor-docker/src/main/java/io/onedev/server/plugin/docker/DockerExecutor.java?position=source-164.15-164.28-1

You may substitute somewhere in subnet with build number to avoid using same subnet for different builds. For instance, to add below option when creating network:

String subnetOption = "--subnet=172.28." + (jobContext.getBuildNumber()%100) + ".0/24"

Subnet may still overlap for instance when build 2 and build 102 happens to run in the same time. But for experiment purpose, it should be sufficient.

Robin Shen commented 3 years ago

PS: When using custom network, --subnet is equivalent of --bip in default bridge network

Artur commented 3 years ago

You are correct that --subnet is equivalent to --bip. I am not sure what meant the suggestion on stackoverflow.

However, I am pretty sure that the occasional error I am facing is related to running multiple builds at the same time. But why this error happens is all my guess at the moment.

From my own experience with Docker on my NAS, I know that if I run multiple containers in "NAT" mode, ports of each container are being mapped to ports on the Docker host. Benefit is that all containers are accessible at the same IP address from outside, each on a different port. This, sometimes causes problems and port conflicts of apps from different containers want access to the same host port.

Therefore, I prefer running each container in "bridge" mode, so it is visible at a different IP address from outside. This way apps in each container can use whatever ports they want, even the same port number can be exposed on each container. I just have to access them using their own IP address.

So, based on this experience, my guess and suggestion is, that something like that may happen for the 1dev builds running concurrently. Maybe solution would be running them in "bridge" mode.

Robin Shen commented 3 years ago

Sorry I am wrong previously. OneDev build uses bridge network on Linux platform. It only uses bridge on Windows since there are issues creating multiple bridge networks on Windows for docker.

I tried to reproduce this by run multiple builds constantly downloading big files from outside and things work fine. If you can help me reproducing the issue, it will be a lot easier to find out the problem.

Robin Shen commented 3 years ago

It only uses bridge on Windows since there are issues creating multiple bridge networks on Windows for docker.

It only uses NAT on Windows since there are issues creating multiple bridge networks on Windows for docker.

Robin Shen commented 3 years ago

Please check content of /etc/resolve.conf both on host and on created step container. To check content of this file on step container, just add command cat /etc/resolve.conf at very start of the step commands, and then run the build.

Robin Shen changed state to 'Closed' 3 years ago
Previous Value Current Value
Open
Closed
Robin Shen commented 3 years ago

Closing this now. Feel free to reopen if you have further info.

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (3)
Reference
onedev/server#351
Please wait...
Page is in error, reload to recover