jbauer opened 8 months ago
|
|||||||
Since 8.1.0, different token should be used for different agent. See this incompatiblity note: https://code.onedev.io/~help/incompatibilities#810 Sharing same token across different agent is not safe and is not maintenable. Assume if you want to remove an agent, it will remove all agents sharing the same token. So temporal agent feature is flawed, as it always use the same token. In future versions, a secure and flexbile approach will be introduced, to launch agent on demand, so that it is possible to launch agent in EC2 or your swarm cluster. |
|||||||
Why is it not safe? If one agent has been compromised it is already an unsafe situation.
Well now it is not maintainable anymore because scaling agents have to be done manually by defining new agent services with their own tokens and deploy them.
Yes that was already clear two years ago as sharing a token was introduced. See issue #601 To mitigate that in issue #602 a distinction has been made between deleting a token (which would remove all agents) and simply remove an agent by name for agents that have become outdated because of scaling changes or hostname changes. In addition temporal agents have been introduced which delete themselves automatically. So now we are back in the same situation from 2 years ago.
I guess there is no concrete time frame? |
|||||||
The unsafe might be inaccurate. I mean a leaked token will affect all agents sharing the token and you need to re-assigne tokens for all of them. Also shared tokens means that there is not a reliable way to disguish different agents, and this may cause other troubles in future.
With the on demand agent launch feature, it will be much maintainable and scalable. OneDev will launch new agents if system is busy, and terminate them if system is idle. However this will be a EE feature. If you want to use on demand agent for now and for free, please consider k8s which is obviously the mainstream. |
|||||||
The agent token is basically the same as an api token. It is usually up to the user to use one token per application or share the token between multiple applications. It is the equivalent of a login. If it is compromised and you revoke it, sure everyone with that "login" looses access.
I don't think using authentication information should be used to distinguish agents. Each agent could compute a unique id and store it in its own working directory. Or simply use the hostname which is already used to create the working directory and thus should be unique.
I don't know how k8s would solve this as you still need to somehow have a single env variable with unique content per pod. Also k8s is usually way too complex to maintain to use it during development unless you are fine with being stuck if something stops working and you have no idea how to fix it. So I don't think it is mainstream unless you reach a certain size and have dedicated persons to maintain it. Currently I don't care that much about auto scaling / on demand creation of agents during development. It is fine in development to have a fixed number of agents and occasionally increase that number if too many builds queue up. However with that step backwards I simply can not configure the agent in the same way as before. If looses features and configuration is more complex. Before I had a single service with three replicas and I could tell docker to never run multiple agents on the same physical host at the same time. Now I would need to create three services without replicas (replica=1) because each service needs its own unique agent token as env variable. Now two or more agents can potentially run on the same physical host, because services are independent. I don't want that because I want maximum parallel computing power. Now I only have the option to tell each of the three services on which physical hosts they are allowed to run. Consider 5 physical hosts and I want 3 agents to be alive all the time while being able to loose 2 physical hosts (e.g. one is in maintenance and the other crashes unexpectedly). That would mean each agent service would need to be assigned to three hosts (since two can go down) but then you have overlapping assignments and multiple agents would likely run on the same host as soon as you shutdown some hosts. With the solution before Docker swarm would simply rearrange the 3 containers on the available hosts. If two hosts are down, no problem, I still have three. |
|||||||
Robin Shen changed fields 8 months ago
|
|||||||
Agent is different from adhoc applications using api tokens. It connected with OneDev server and is part of the build grid, and OneDev server has to manage them.
An authoritive identification, not something generated at agent side. Shared tokens continues to bite me while I am developing the build grid feature. And I can not remember some of them exactly. This is why I decide to remove it in 8.1.0. For the majority cases, starting fixed agents is enough. For cases where dynamic agents are required, looking for the EE feature is fair I think. |
|||||||
Also agent work files will no longer be put under a directory identified by host name as this causes side effect for majority of cases (cache missing if host name changes etc). Each agent should mount a local directory as work volume. Mounting NFS as work directory will make cache extremely slow as cache typically contains many many small files. |
|||||||
Yes I moved away from NFS for the agent. It was too slow. Agents now use a local volume even if that means loosing the whole cache because the agent has been moved to a different physical host by the orchestrator. But this only happens rarely and did not justify the longer build times when using NFS. Do you think OneDev will stay compatible and useable with docker / docker swarm or will it be a k8s only product sooner or later? |
|||||||
More than 90% OneDev deployments (based on download metrics) are in docker enviornments directly without k8s. So OneDev will not be a k8s only product. For docker swarm support, do you know if it has some convenient api to launch container programatically? |
|||||||
For HTTP examples see here https://docs.docker.com/engine/api/sdk/examples/ (switch examples to HTTP) The engine API itself is documented at https://docs.docker.com/engine/api/ and also allows creating swarm clusters, managing swarm services, etc. For example: https://docs.docker.com/engine/api/v1.43/#tag/Service/operation/ServiceCreate |
|||||||
Thanks. Will check them when implement the on demand agent feature (issue #1545) |
|||||||
In Docker swarm there are only swarm services and docker swarm manages the running containers (which are called tasks within a swarm service) of that service. However just because docker has been configured using docker swarm it does not forbid running normal containers that are not managed by docker swarm. |
|||||||
jbauer changed state to 'Closed' 8 months ago
|
|||||||
Closing this for now. I have updated OneDev to latest and reconfigured the agent deployment. It is less optimal now but at least there are multiple agents online again. |
Type |
Question
|
Priority |
Major
|
Assignee | |
Labels |
No labels
|
I have upgraded from 8.6.10 to 9.1.2 and now temporal agents do not seem to work correctly anymore.
I am using Docker swarm to scale agents and thus agents use properties
serverUrl
,agentTokenFile
andtemporalAgent
. I had 2 agents running and after the upgrade OneDev UI shows both but one was always offline. I stopped both docker containers and OneDev now showed both as offline. This is already weird because temporal agents should just disappear in the OneDev UI instead of being listed as offline.So I deleted both agents in OneDev, generated a new agent token, reconfigured the docker service, pulled the newest agent image manually and scaled the service to 1 to start a single agent. It showed up in OneDev as temporal agent with name
onedev-agent-1
(the hostname) and agent log in OneDev isScaling the agent service to 2 in order to start a second instance of the agent with hostname
onedev-agent-2
does not change anything in the OneDev UI. It still shows a single agent. The container log of the second agent isSo two issues: