wojtek opened 7 months ago
|
|||||
Please edit the helm chart to run a shell with entry command set to sleep for a while, and then exec into the container, switch to /opt/onedev/logs to check console and server log to see if there are more info. |
|||||
I managed to get more logs from k8s, I think it's kernel killing JVM (memory limits):
|
|||||
OneDev may need to use more memory if you imported large repositories from GitHub. OneDev helm does not set any limit on resources. Do you have any k8s policy enforce that? |
|||||
Yes, we do set limits, though they seem generous:
On the other hand, I importend roughtly 250 projects from YT (about 16k issues) and about 180 git repositories from GitHub. Would you reckon that ~2G of memory would be to little to handle it? |
|||||
One surprising thing is that the pod is not killed when the JVM tries to allocate more memory but rather wrapper seems to keep the pod running and tries to restart the pod ("JVM was running for 71 seconds (less than the successful invocation time of 300 seconds). / Incrementing failed invocation count (currently 3). / Reloading Wrapper configuration...") - is that intended? |
|||||
Yes it is intended behavior of wrapper. It tries to handle JVM failure and the pod is not aware of failure. 16k issues will not consider too much memory. However 180 git repositories may consume a lot depending on its size. Also for each repo, OneDev will create Lucene index which consumes additional memory. I'd suggest to increase it to 8G initially. If it turns out to be too much (check via menu Administration / System Maintenance / Server Information page), you can decrease then. |
|||||
BTW: I use 5G memory for this OneDev instance. |
|||||
Bumping the memory to double (4G) allowed me to start 1dev correctly.
The panel said after startup 1,6G used. Forcing GC brought it back to ~200M and then stabilised around 0,5G. I wonder, are there any more detailed statistics (for example how much Lucene index/caches uses)? Is there a way to tweak Lucene configuration? |
|||||
The initial import will need to index all files, and can take more memory. Once indexed, the memory usage drops. There is no statistics for index/cache, and there is no configuration to tweak lucene. I hope to make it as simple as possible. |
|||||
Robin Shen changed state to 'Closed' 7 months ago
|
|||||
Closing this issue now. BTW: you can also configure performance settings in administration menu to reduce concurrent cpu intensive tasks. This should use less memory for bulk repository imports. |
|||||
I wonder - it seems that the JVM was killed externally (kernel? SIGKILL) which would suggest that the assigned heap was higher than available memory? Why this may be important? It could be possible that JVM would consider that it still has available memory and didn't trigger GC to free up memory?
We already have task concurrency to 1. |
|||||
During initial import of git repositories, OneDev has to go through all git history to build up indexes and caches and it can hold considerable memory and can not be GCed until things are done. |
|||||
It happened again but this time when importing from YT still (https://code.onedev.io/onedev/server/~issues/1592#IssueComment-5770) I was monitoring closely resource usage in
It looks like the kernell kills the process:
So, considering low heap usage (~20%) would mean that the allocated heap is too big. In the docker container (we use:
Which seems quite conservative, but still too much for some reason. |
|||||
wojtek changed state to 'Open' 7 months ago
|
|||||
Are the sources of the wrapper available? I think the problem boils down to how k8s/containers work (cgroups2 for setting limits). The pod has resources (limits) defined but the free or JVM supports detection of being run inside container and properly adjust limits. It seems that the wrapper is unaware of that. The k8s node has ~8G of memory, pod has ~2G limit:
Process limits:
It seems that indeed wrapper ignores container limits and sets limit to 50% of the whole node (and not container!): Would you consider running JVM directly? It would help avoid such headaches... what's more - if the onedev pod crash, then it would be handled by k8s as it should be... |
|||||
Regular JVM correctly detects container environment and properly adjust heap limits:
(btw. are there any plans to somewhat update the JVM? maybe at least JVM17?) |
|||||
This turns out to be a JSW bug. Will use -XX:MaxRAMPercentage directly in next patch release. I filed it as a separate bug here: https://code.onedev.io/onedev/server/~issues/1600 For now, please raise memory limit to get the import done. |
|||||
I increased heap (manually) to ~3G (needs to juggle a bit, will try with more) but still with ~3G the import of only YT isssues failed:
It seems like it happened at the very last step. I wonder if's a problem with hibernate itself? Or how it's used (trying to store everything at once)? Another qualm I have - would the same issue be experience during upgrade (where the database is dumped and re-imported)? |
|||||
Are you importing all projects at once? If so, how many issues do you have for all these projects? OneDev tries to import all in a single transaction and may consume considerable memory if there are many issues. Upgrade does not experience this issue as it is done batch by batch. |
|||||
Yes, I'm importing all projects and all issues at once. From non-archived projects (~170 projects) YT lists about ~16k issues. There are also archived projects (~70) which may have additional couple of thousands. Would it be possible to also batch the import instead of doing single transaction?
That sounds good, however I'm worried now, that the upgrade will take quite a long time... |
|||||
Issues also have comments and custom fields, and this can use lots of memory. I will investigate the approach of importing in batch, but it takes time. Right now, please add more memory to see if it helps. |
|||||
Upgrade is much faster than import. I tested with 100k issues each with some comments and upgrade completes within 10 minutes. |
|||||
Also please attach the full stack trace when this issue happens. |
|||||
I'll try but for certain reasons I have very little wiggle room...
That's good to hear!
This is the complete ST from the previous excerpt:
|
|||||
Thanks for posting the stack trace. Found an issue that excessive threads got created when importing many issues. Due to this, the import may still fail even if you add more memory. Please hold on this until the issue is fixed. |
|||||
Hmm... that explains a lot :-)
Undestood. |
|||||
Please upgrade to 9.2.1 which calculates max memory percent based on container memory, and the percentage can also be controlled via helm setting This version also commits imported issues per project to save memory. Please give it a try and let me know the result. |
|||||
Upgraded to 9.2.1. Wrapper settings:
Yet -Xmx seems to be calculated incorrectly:
Btw. in this version wrapper doesn't seem to correctly handle the situation as the JVM is not responding but it's still working:
Logs
|
|||||
Seems that you've changed |
|||||
Indeed there was such line, I commented it out and manually set
but I don't see it applied. |
|||||
What is the java process command line being displayed now? |
|||||
which is odd - i would assume that |
|||||
Are you using the official image and helm chart? I tested with GKE and it works fine either with new installation or upgrade from 9.2.0 |
|||||
Please also use the offical wrapper.conf to see if it helps. |
|||||
We use custom image (based on official one with additional jar file included):
and custom helm (we don't set
With default wrapper:
Still it set's Xmx to
|
|||||
OK, I think that I figure it out... the issue is that there are two directories -
|
|||||
Going back to the original issue (importing): about 5 minutes into the import (usually took ~45 minutes till crash) the import spilled out exception:
Checking 1dev logs there is again OOM - which surprised me as I watched
You said that you reworked threading - how many thread do you try to create? Usually threadstack needs ~1M (by default). I was pondering grabbing heapdump during the crash, but it's not related to heap thus it wouldn't help... |
|||||
I will set up a youtrack instance with 100k issues and import to see what happens. Previously I was testing with small dataset... |
|||||
Please upgrade to 9.2.2 which removed more threads during import. I tested importing 100k issues with heap mem never reached 1G. |
|||||
Upgraded, imported everything and was greeted with "Projects imported successfully" after ~15 minutes. Awesome! 👍 |
|||||
wojtek changed state to 'Closed' 6 months ago
|
Type |
Question
|
Priority |
Normal
|
Assignee | |
Labels |
No labels
|
After running the import from GitHub (on top of previous import from YouTrack) the 1dev crashed and right now it's pod is in restart loop with quite enigmatic logs:
Would it be possible to provide more detailed logs (output in stdout)? Ideally with timestamps?