#602  OneDev Agent quirks after being more compatible to Docker Swarm
Closed
jbauer opened 2 years ago

Hi,

thanks for the fast update on the agent! I tried it and now all agents appear in OneDev Server. However I found some quirks though:

  1. When deploying the agent service with 2 replicas I can see 2 agents in OneDev server. If I select one of them and then choose Operations -> Remove Selected Agents ALL agents will be removed instead of just the selected one. I guess that is because all agents use the same auth token and the auth token will be removed together with the selected agent. So I had to generate a new token and redeploy the agent service again.
  2. When deploying the agent service with 2 replicas they register with their container IP addresses. When I scale down the server from 2 replicas to 1 replica then one agent will be marked as down. That is fine. However when I scale back up to 2 replicas the newly created second replica has a new IP address inside the container. OneDev server marks the second agent as Online again (since the name is the same) but still shows the old IP address. So maybe the second agent is now unusable? At least the user can not trust the UI anymore.

Agent removal might need to be done by name now? Also the token might only be deleted if it is unused. However maybe it makes more sense to ask the user if he really wants to delete the unused token. Maybe the user just wants to remove all registrations to force a new registration of all agents (but keeping the current auth token of these agents). Alternatively the token UI could add a visual label "unused" next to the token, so that a user knows it can either reuse the token or delete it. In that case OneDev server would not delete tokens automatically at all.

Robin Shen commented 2 years ago

Removing an agent at server side actually means to delete the token used by the agent. Removing by name is meaningless as agent can always connect again to get them reappear in the agent list. If agents are sharing same token, they will all be removed. This is unfortunately a side effect of sharing tokens.

jbauer commented 2 years ago

Obviously I would only delete agents that won't come back online at all or anytime soon.

There are two cases:

  • Scaling down the service for whatever reasons for a longer period of time (or forever).
  • Renaming the hostnames within the service because of typo or a new naming convention inside the cluster

In both cases we now have offline agents that can not be deleted without deleting the whole agent cluster.

So if I reconfigure the agent docker service twice with a different hostname scheme I end up with::

Bildschirmfoto 2022-03-01 um 13.23.20.png

and the offline agents will stay forever because I can not delete them. If I want to I have to recreate the whole thing by using a new token.

It is okay'ish for me, but at least a warning should popup if an agent should be deleted that uses a shared token. Otherwise people accidentally, like me, delete the whole thing and have to recreate the agent cluster.

Shouldn't there be some DB table which simply stores registered agents and we just want to delete some entries from it? Maybe they reappear if the agent still lives, but maybe the agent is indeed dead.

Robin Shen commented 2 years ago

Thanks for the detailed explanation. I understand your situation now.

I plan to add a separate action called unauthorize to remove token used by an agent. This is useful for instance when token is leaked or agent should not be trusted.

The current remove action will be modified to merely remove the named agent entry from db so that offline agents can be cleaned up.

Also a new environment variable -e temporalAgent=true will be added to auto-remove agent entries if they go offline. This will work better for agent scale up/down.

Robin Shen referenced from other issue 2 years ago
Robin Shen referenced from other issue 2 years ago
Robin Shen commented 2 years ago

Also for incorrect ip address, OneDev gets ip address from RemoteAddress header of the http request sent by agent. This ip address is updated each time agent connects to server.

I tested locally by running agent and server in same docker network dev, the first time agent connects to server, the ip address displays correctly. Then I take down the agent, run a separate container in network dev to occupy original ip address of the agent, and start the agent again to force it to use a new ip address. When it connects back, ip address is updated correctly. I am not sure if there is any network magic inside Swarm, if you can help to investigate the issue it will be of a lot help.

jbauer commented 2 years ago

I think the issues with the IP address is indeed no issues.

When using Docker Swarm with an overlay network (so that containers can talk to each other across different physical hosts) each physical docker host has a virtual IP for that overlay network used for load balancing. These IPs can be seen when calling docker inspect <ID of overlay> on all the different physical hosts. It is called "lb-" (lb = load balancer). That virtual IP only changes when you restart the docker daemon on a given physical host.

OneDev server only sees this virtual IP and not the container IP, which is what has irritated me. As a consequence it could also happen that two agents with different names but the same IP could be registered if docker swarm decides to deploy two agent instances on the same physical host. If OneDev server that talks to that IP docker will choose a running agent in a round robin fashion on that host.

In case you are interested you can read both links below which describe how docker swarm networking works:

https://blog.revolve.team/2017/04/25/deep-dive-into-docker-overlay-networks-part-1/ https://blog.revolve.team/2017/05/09/deep-dive-into-docker-overlay-networks-part-2/

Robin Shen commented 2 years ago

Thanks for helping me understanding the issue. Will change OneDev to detect ip address at agent side instead of relying on RemoteAddress header

Robin Shen referenced from other issue 2 years ago
Robin Shen changed state to 'Closed' 2 years ago
Previous Value Current Value
Open
Closed
OneDev referenced from other issue 7 months ago
issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (4)
Reference
onedev/server#602
Please wait...
Page is in error, reload to recover