#799  Unable to create cache directory on Kubernetes executor
Released
Maciej Grabowski opened 2 years ago

OneDev deployed on Kubernetes cluster via Helm charts. I'm getting following error when trying to checkout code on same Kubernetes cluster:

22:45:012022-07-05T22:44:57.280901419+02:00 Setting up job cache...
22:45:012022-07-05T22:44:57.296313838+02:00 java.lang.RuntimeException: Unable to create directory: /onedev-build/cache/kubernetes

No cache is configured in this pipeline, I'm assuming that's default one.

Robin Shen commented 2 years ago

OneDev tries to initialize cache home at start of job. It works fine on google kubernetes cluster. Can you please let me know your detailed setup?

Maciej Grabowski commented 2 years ago

Sure.

Kubernetes 1.24 deployed via Kubeadm 1.24 on 3 instances of Fedora 36. CRI-O 1.24 as container runtime, MetalLB as load balancer for services, HA Proxy as load balancer for API, Kadalu as CSI plugin. I have several other apps deployed on this cluster, so general health of installation is fine.

What else would you like to know?

Maciej Grabowski changed title 2 years ago
Previous Value Current Value
Unable to create cache direcotry on Kubernetes executor
Unable to create cache directory on Kubernetes executor
Robin Shen commented 2 years ago

OneDev mounts directory /var/cache/onedev-build on k8s node as /onedev-build/cache inside container. This error happens as process running in the container can not create sub directory kubernetes under this mounted directory. Does your cluster have any policy preventing container from writing into mounted host paths?

Maciej Grabowski commented 2 years ago

Yes, as most OS paths, it's protected by SELinux, which is enabled by default on all modern distros. Cache directory creation works with SELinux disabled, but that's insecure setup and should be avoided.

Please also note that in most professional setups, OS volume of Kubernetes worker is provisioned from snapshot / template with minimal capacity (just few GB of disk space) and from storage system with low performance characteristics and low write endurance. Utilizing OS volume for (potentially) capacity and performance intensive operations can lead to host of serious issues - 100% disk capacity consumption, etcd timeouts, worker being marked as unscheduled... Simply put, it shouldn't be done.

Since ephemeral volumes are barely supported and you already create throwaway namespace for each job, I would suggest to expand Kubernetes executor with field for StorageClass that should be used in PVC for cache volume creation. If PVC is created in same namespace as job, it will be bound to namespace and it will be deleted with namespace. If StorageClass is configured with retention policy set to "delete", deletion of namespace will delete content of cache.

If no StorageClass is specified in PVC, it will fallback to default StorageClass - field for StorageClass in Kubernetes execution can be optional and empty value is still valid value.

By using PVC instead of HostPath you will allow creation of cache structure on storage system that is configurable by admin and is designed to be able to provide proper performance, capacity and endurance characteristics.

Maciej Grabowski changed fields 2 years ago
Name Previous Value Current Value
Priority
Normal
Major
Robin Shen commented 2 years ago

Thanks for the info. I filed another issue addressing this:

issue #800 - Able to use PVC instead of host path to store cache in k8s deployment

For this issue, OneDev will be improved to avoid creating sub directory if cache is not being used.

OneDev changed state to 'Closed' 2 years ago
Previous Value Current Value
Open
Closed
OneDev commented 2 years ago

State changed as code fixing the issue is committed

OneDev changed state to 'Released' 2 years ago
Previous Value Current Value
Closed
Released
OneDev commented 2 years ago

State changed as build #2812 is successful

issue 1 of 1
Type
Bug
Priority
Major
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (4)
Reference
onedev/server#799
Please wait...
Page is in error, reload to recover