#1596  1dev crashed/broken after github import
Closed
wojtek opened 7 months ago

After running the import from GitHub (on top of previous import from YouTrack) the 1dev crashed and right now it's pod is in restart loop with quite enigmatic logs:

--> Wrapper Started as Console
Java Service Wrapper Standard Edition 64-bit 3.5.51
  Copyright (C) 1999-2022 Tanuki Software, Ltd. All Rights Reserved.
    http://wrapper.tanukisoftware.com
  Licensed to OneDev for Service Wrapping

Launching a JVM...
WrapperManager: Initializing...
INFO  - Launching application from '/app'...
INFO  - Starting application...
INFO  - Successfully checked /opt/onedev
INFO  - Stopping application...
<-- Wrapper Stopped
JVM exited unexpectedly.
JVM exited unexpectedly.

Would it be possible to provide more detailed logs (output in stdout)? Ideally with timestamps?

Robin Shen commented 7 months ago

Please edit the helm chart to run a shell with entry command set to sleep for a while, and then exec into the container, switch to /opt/onedev/logs to check console and server log to see if there are more info.

wojtek commented 7 months ago

I managed to get more logs from k8s, I think it's kernel killing JVM (memory limits): JVM received a signal SIGKILL (9)., but I'll have to verify it:

kubectl -n onedev logs deployments/onedev
Defaulted container "onedev" out of: onedev, init (init), database-create (init)
--> Wrapper Started as Console
Java Service Wrapper Standard Edition 64-bit 3.5.51
  Copyright (C) 1999-2022 Tanuki Software, Ltd. All Rights Reserved.
    http://wrapper.tanukisoftware.com
  Licensed to OneDev for Service Wrapping

Launching a JVM...
WrapperManager: Initializing...
INFO  - Launching application from '/app'...
INFO  - Starting application...
INFO  - Successfully checked /opt/onedev
INFO  - Stopping application...
<-- Wrapper Stopped
JVM exited unexpectedly.
JVM exited unexpectedly.
JVM exited unexpectedly.
--> Wrapper Started as Console
Java Service Wrapper Standard Edition 64-bit 3.5.51
  Copyright (C) 1999-2022 Tanuki Software, Ltd. All Rights Reserved.
    http://wrapper.tanukisoftware.com
  Licensed to OneDev for Service Wrapping

Launching a JVM...
WrapperManager: Initializing...
09:19:42 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
09:19:42 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
09:19:43 INFO  io.onedev.commons.loader.AppLoader - Starting application...
09:20:04 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
09:20:04 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
09:20:04 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
09:20:04 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
09:20:07 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
09:20:07 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
09:20:38 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM was running for 77 seconds (less than the successful invocation time of 300 seconds).
  Incrementing failed invocation count (currently 1).
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
09:21:04 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
09:21:04 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
09:21:06 INFO  io.onedev.commons.loader.AppLoader - Starting application...
09:21:27 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
09:21:28 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
09:21:29 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
09:21:36 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
09:21:37 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
09:21:37 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
09:21:55 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM was running for 75 seconds (less than the successful invocation time of 300 seconds).
  Incrementing failed invocation count (currently 2).
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
09:22:26 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
09:22:26 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
09:22:28 INFO  io.onedev.commons.loader.AppLoader - Starting application...
09:22:49 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
09:22:49 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
09:22:49 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
09:22:50 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
09:22:51 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
09:22:52 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
09:23:23 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM was running for 71 seconds (less than the successful invocation time of 300 seconds).
  Incrementing failed invocation count (currently 3).
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
09:23:43 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
09:23:43 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
09:23:44 INFO  io.onedev.commons.loader.AppLoader - Starting application...
JVM exited unexpectedly.
Robin Shen commented 7 months ago

OneDev may need to use more memory if you imported large repositories from GitHub.

OneDev helm does not set any limit on resources. Do you have any k8s policy enforce that?

wojtek commented 7 months ago

Yes, we do set limits, though they seem generous:

    Limits:
      memory:  1920Mi
    Requests:
      memory:  1920Mi

On the other hand, I importend roughtly 250 projects from YT (about 16k issues) and about 180 git repositories from GitHub.

Would you reckon that ~2G of memory would be to little to handle it?

wojtek commented 7 months ago

One surprising thing is that the pod is not killed when the JVM tries to allocate more memory but rather wrapper seems to keep the pod running and tries to restart the pod ("JVM was running for 71 seconds (less than the successful invocation time of 300 seconds). / Incrementing failed invocation count (currently 3). / Reloading Wrapper configuration...") - is that intended?

Robin Shen commented 7 months ago

Yes it is intended behavior of wrapper. It tries to handle JVM failure and the pod is not aware of failure.

16k issues will not consider too much memory. However 180 git repositories may consume a lot depending on its size. Also for each repo, OneDev will create Lucene index which consumes additional memory.

I'd suggest to increase it to 8G initially. If it turns out to be too much (check via menu Administration / System Maintenance / Server Information page), you can decrease then.

Robin Shen commented 7 months ago

BTW: I use 5G memory for this OneDev instance.

wojtek commented 7 months ago

16k issues will not consider too much memory. However 180 git repositories may consume a lot depending on its size. Also for each repo, OneDev will create Lucene index which consumes additional memory.

Bumping the memory to double (4G) allowed me to start 1dev correctly.

I'd suggest to increase it to 8G initially. If it turns out to be too much (check via menu Administration / System Maintenance / Server Information page), you can decrease then.

The panel said after startup 1,6G used. Forcing GC brought it back to ~200M and then stabilised around 0,5G.

I wonder, are there any more detailed statistics (for example how much Lucene index/caches uses)? Is there a way to tweak Lucene configuration?

Robin Shen commented 7 months ago

The panel said after startup 1,6G used. Forcing GC brought it back to ~200M and then stabilised around 0,5G.

The initial import will need to index all files, and can take more memory. Once indexed, the memory usage drops.

There is no statistics for index/cache, and there is no configuration to tweak lucene. I hope to make it as simple as possible.

Robin Shen changed state to 'Closed' 7 months ago
Previous Value Current Value
Open
Closed
Robin Shen commented 7 months ago

Closing this issue now.

BTW: you can also configure performance settings in administration menu to reduce concurrent cpu intensive tasks. This should use less memory for bulk repository imports.

wojtek commented 7 months ago

The panel said after startup 1,6G used. Forcing GC brought it back to ~200M and then stabilised around 0,5G.

The initial import will need to index all files, and can take more memory. Once indexed, the memory usage drops. There is no statistics for index/cache, and there is no configuration to tweak lucene. I hope to make it as simple as possible.

I wonder - it seems that the JVM was killed externally (kernel? SIGKILL) which would suggest that the assigned heap was higher than available memory? Why this may be important? It could be possible that JVM would consider that it still has available memory and didn't trigger GC to free up memory?

BTW: you can also configure performance settings in administration menu to reduce concurrent cpu intensive tasks. This should use less memory for bulk repository imports.

We already have task concurrency to 1.

Robin Shen commented 7 months ago

During initial import of git repositories, OneDev has to go through all git history to build up indexes and caches and it can hold considerable memory and can not be GCed until things are done.

wojtek commented 7 months ago

It happened again but this time when importing from YT still (https://code.onedev.io/onedev/server/~issues/1592#IssueComment-5770)

I was monitoring closely resource usage in /~administration/server-information - total heap is 3888M and usage was hoovering mostly around ~400-600M (so still plenty of room) yet in the logs there is again

console.log

11:05:55 INFO  i.o.s.m.impl.DefaultProjectManager - Initializing git repository in '/opt/onedev/site/projects/231/git'...
11:06:03 INFO  i.o.s.m.impl.DefaultProjectManager - Initializing git repository in '/opt/onedev/site/projects/232/git'...
11:06:07 INFO  i.o.s.m.impl.DefaultProjectManager - Initializing git repository in '/opt/onedev/site/projects/233/git'...
11:06:54 INFO  i.o.s.m.impl.DefaultProjectManager - Initializing git repository in '/opt/onedev/site/projects/234/git'...
11:08:03 INFO  i.o.s.m.impl.DefaultProjectManager - Initializing git repository in '/opt/onedev/site/projects/235/git'...
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM exited unexpectedly.
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
11:08:16 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
11:08:16 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
11:08:17 INFO  io.onedev.commons.loader.AppLoader - Starting application...
11:08:54 INFO  i.o.s.m.impl.DefaultProjectManager - Checking projects...

server.log:

2023-10-13 11:05:55,889 INFO  [hz.10.42.152.50:5710.cached.thread-3] i.o.s.m.impl.DefaultProjectManager Initializing git repository in '/opt/onedev/site/projects/231/git'...
2023-10-13 11:06:03,525 INFO  [hz.10.42.152.50:5710.cached.thread-4] i.o.s.m.impl.DefaultProjectManager Initializing git repository in '/opt/onedev/site/projects/232/git'...
2023-10-13 11:06:07,796 INFO  [hz.10.42.152.50:5710.cached.thread-4] i.o.s.m.impl.DefaultProjectManager Initializing git repository in '/opt/onedev/site/projects/233/git'...
2023-10-13 11:06:54,221 INFO  [hz.10.42.152.50:5710.cached.thread-5] i.o.s.m.impl.DefaultProjectManager Initializing git repository in '/opt/onedev/site/projects/234/git'...
2023-10-13 11:08:03,718 INFO  [hz.10.42.152.50:5710.cached.thread-3] i.o.s.m.impl.DefaultProjectManager Initializing git repository in '/opt/onedev/site/projects/235/git'...
2023-10-13 11:08:16,046 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Launching application from '/opt/onedev'...
2023-10-13 11:08:16,050 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Cleaning temp directory...
2023-10-13 11:08:17,687 INFO  [WrapperSimpleAppMain] io.onedev.commons.loader.AppLoader Starting application...
2023-10-13 11:08:54,987 INFO  [WrapperSimpleAppMain] i.o.s.m.impl.DefaultProjectManager Checking projects...
2023-10-13 11:08:58,540 INFO  [WrapperSimpleAppMain] i.o.s.m.i.DefaultAgentAttributeManager Caching agent attribute info...
2023-10-13 11:08:59,711 INFO  [WrapperSimpleAppMain] i.o.s.m.i.DefaultBuildMetricManager Caching build metric info...
2023-10-13 11:09:00,061 INFO  [WrapperSimpleAppMain] i.o.s.m.impl.DefaultBuildManager Caching build info...
2023-10-13 11:09:00,341 INFO  [WrapperSimpleAppMain] i.o.s.m.i.DefaultBuildParamManager Caching build param info...
2023-10-13 11:09:00,501 INFO  [WrapperSimpleAppMain] i.o.s.m.impl.DefaultIssueManager Caching issue info...
2023-10-13 11:09:10,479 INFO  [WrapperSimpleAppMain] io.onedev.server.OneDev Server is ready at http://10.43.173.214

It looks like the kernell kills the process:

[1336783.912831] Tasks state (memory values in pages):
[1336783.912831] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[1336783.912834] [2670595]     0 2670595     1969     1091    53248        0           754 entrypoint.sh
[1336783.912836] [2672757]     0 2672757     4023      873    49152        0           754 wrapper-linux-x
[1336783.912980] [2672775]     0 2672775  2053273   474118  6643712        0           754 java
[1336783.912981] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-ab21292d767e936867986f334654271c2c5d24fb982312727fecf687cc451d21.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9343f3c_c59d_478d_b95c_302777c68445.slice/cri-containerd-ab21292d767e936867986f334654271c2c5d24fb982312727fecf687cc451d21.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9343f3c_c59d_478d_b95c_302777c68445.slice/cri-containerd-ab21292d767e936867986f334654271c2c5d24fb982312727fecf687cc451d21.scope,task=java,pid=2672775,uid=0
[1336783.914561] Memory cgroup out of memory: Killed process 2672775 (java) total-vm:8213092kB, anon-rss:1866400kB, file-rss:30072kB, shmem-rss:0kB, UID:0 pgtables:6488kB oom_score_adj:754
[1336784.023261] oom_reaper: reaped process 2672775 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[1339894.676764] IPv6: ADDRCONF(NETDEV_CHANGE): califc773fecfb7: link becomes ready
[1343701.862372] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[1343701.862396] IPv6: ADDRCONF(NETDEV_CHANGE): calie959215e5a4: link becomes ready

So, considering low heap usage (~20%) would mean that the allocated heap is too big.

In the docker container (we use: 1dev/server:9.2.0) in /opt/onedev/conf/wrapper.conf I see:

# Maximum Java Heap Size (in MB)
#wrapper.java.maxmemory=1024
wrapper.java.maxmemory.percent=50

Which seems quite conservative, but still too much for some reason.

wojtek changed state to 'Open' 7 months ago
Previous Value Current Value
Closed
Open
wojtek commented 7 months ago

Are the sources of the wrapper available?

I think the problem boils down to how k8s/containers work (cgroups2 for setting limits). The pod has resources (limits) defined but the free or /proc/meminfo reports underlying node limits.

JVM supports detection of being run inside container and properly adjust limits. It seems that the wrapper is unaware of that.

The k8s node has ~8G of memory, pod has ~2G limit:

Containers:
  onedev:
    Limits:
      memory:  1920Mi
    Requests:
      memory:  1920Mi

Process limits:

root        6356  1.3 10.0 6869884 803336 ?      Sl   11:08   4:01 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.awt.headless=true --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.desktop/java.awt.font=ALL-UNNAMED --add-modules=java.se --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/sun.nio.fs=ALL-UNNAMED -Xmx3888m -Djava.library.path=. -classpath ch.qos.logback.logback-classic-1.2.11.jar:ch.qos.logback.logback-core-1.2.11.jar:io.onedev.commons-bootstrap-2.7.22.jar:org.slf4j.jul-to-slf4j-1.7.36.jar:org.slf4j.log4j-over-slf4j-1.7.36.jar:org.slf4j.slf4j-api-1.7.36.jar:wrapper.jar -Dwrapper.key=WDb8lfCZv2GXeJxoWg8EGhFplaGP-ZRW -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=205 -Dwrapper.version=3.5.51-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.cpu.timeout=3600 -Dwrapper.jvmid=2 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang org.tanukisoftware.wrapper.WrapperSimpleApp io.onedev.commons.bootstrap.Bootstrap

It seems that indeed wrapper ignores container limits and sets limit to 50% of the whole node (and not container!): -Xmx3888m which then causes the issue with kernel killing the project as obviously JVM tries to allocate more memory than available.

Would you consider running JVM directly? It would help avoid such headaches... what's more - if the onedev pod crash, then it would be handled by k8s as it should be...

wojtek commented 7 months ago

Regular JVM correctly detects container environment and properly adjust heap limits:

root@onedev-66d9cf94-kwgfl:/opt/onedev/logs# /usr/lib/jvm/java-11-openjdk-amd64/bin/java -XX:MaxRAMPercentage=60.0 -XshowSettings:vm -version
VM settings:
    Max. Heap Size (Estimated): 1.12G
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "11.0.20.1" 2023-08-24
OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)


root@onedev-66d9cf94-kwgfl:/opt/onedev/logs# /usr/lib/jvm/java-11-openjdk-amd64/bin/java  -XshowSettings:vm -version
VM settings:
    Max. Heap Size (Estimated): 480.00M
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "11.0.20.1" 2023-08-24
OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

(btw. are there any plans to somewhat update the JVM? maybe at least JVM17?)

Robin Shen commented 7 months ago

This turns out to be a JSW bug. Will use -XX:MaxRAMPercentage directly in next patch release. I filed it as a separate bug here:

https://code.onedev.io/onedev/server/~issues/1600

For now, please raise memory limit to get the import done.

wojtek commented 7 months ago

I increased heap (manually) to ~3G (needs to juggle a bit, will try with more) but still with ~3G the import of only YT isssues failed:

Importing issues...
Imported 0 issues
org.hibernate.resource.transaction.LocalSynchronizationException: Exception calling user Synchronization (afterCompletion): io.onedev.server.persistence.DefaultTransactionManager$1
    	at org.hibernate.resource.transaction.internal.SynchronizationRegistryStandardImpl.notifySynchronizationsAfterTransactionCompletion(SynchronizationRegistryStandardImpl.java:89)
…
    Caused by: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    	at java.base/java.lang.Thread.start0(Native Method)

It seems like it happened at the very last step. I wonder if's a problem with hibernate itself? Or how it's used (trying to store everything at once)?

Another qualm I have - would the same issue be experience during upgrade (where the database is dumped and re-imported)?

Robin Shen commented 7 months ago

Are you importing all projects at once? If so, how many issues do you have for all these projects? OneDev tries to import all in a single transaction and may consume considerable memory if there are many issues.

Upgrade does not experience this issue as it is done batch by batch.

wojtek commented 7 months ago

Yes, I'm importing all projects and all issues at once. From non-archived projects (~170 projects) YT lists about ~16k issues. There are also archived projects (~70) which may have additional couple of thousands.

Would it be possible to also batch the import instead of doing single transaction?

Upgrade does not experience this issue as it is done batch by batch.

That sounds good, however I'm worried now, that the upgrade will take quite a long time...

Robin Shen commented 7 months ago

Issues also have comments and custom fields, and this can use lots of memory. I will investigate the approach of importing in batch, but it takes time.

Right now, please add more memory to see if it helps.

Robin Shen commented 7 months ago

That sounds good, however I'm worried now, that the upgrade will take quite a long time...

Upgrade is much faster than import. I tested with 100k issues each with some comments and upgrade completes within 10 minutes.

Robin Shen commented 7 months ago
Caused by: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Also please attach the full stack trace when this issue happens.

wojtek commented 7 months ago

Issues also have comments and custom fields, and this can use lots of memory. I will investigate the approach of importing in batch, but it takes time.

Right now, please add more memory to see if it helps.

I'll try but for certain reasons I have very little wiggle room...

That sounds good, however I'm worried now, that the upgrade will take quite a long time...

Upgrade is much faster than import. I tested with 100k issues each with some comments and upgrade completes within 10 minutes.

That's good to hear!

Caused by: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Also please attach the full stack trace when this issue happens.

This is the complete ST from the previous excerpt:

Importing issues...
Imported 0 issues
org.hibernate.resource.transaction.LocalSynchronizationException: Exception calling user Synchronization (afterCompletion): io.onedev.server.persistence.DefaultTransactionManager$1
    	at org.hibernate.resource.transaction.internal.SynchronizationRegistryStandardImpl.notifySynchronizationsAfterTransactionCompletion(SynchronizationRegistryStandardImpl.java:89)
    	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.afterCompletionCallback(JdbcResourceLocalTransactionCoordinatorImpl.java:201)
    	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.access$400(JdbcResourceLocalTransactionCoordinatorImpl.java:40)
    	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:283)
    	at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101)
    	at io.onedev.server.persistence.DefaultTransactionManager.lambda$call$0(DefaultTransactionManager.java:70)
    	at io.onedev.server.persistence.DefaultSessionManager.call(DefaultSessionManager.java:90)
    	at io.onedev.server.persistence.DefaultTransactionManager.call(DefaultTransactionManager.java:60)
    	at io.onedev.server.web.page.project.imports.ProjectImportPage$1$1.runTask(ProjectImportPage.java:73)
    	at io.onedev.server.web.component.taskbutton.TaskButton$2.call(TaskButton.java:160)
    	at io.onedev.server.web.component.taskbutton.TaskButton$2.call(TaskButton.java:133)
    	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    	at io.onedev.server.security.SecurityUtils.lambda$inheritSubject$1(SecurityUtils.java:350)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    	at java.base/java.lang.Thread.start0(Native Method)
    	at java.base/java.lang.Thread.start(Thread.java:798)
    	at com.hazelcast.internal.util.ConcurrencyUtil.lambda$static$0(ConcurrencyUtil.java:57)
    	at com.hazelcast.spi.impl.AbstractInvocationFuture.unblockWhenComplete(AbstractInvocationFuture.java:917)
    	at com.hazelcast.spi.impl.AbstractInvocationFuture.whenCompleteAsync(AbstractInvocationFuture.java:244)
    	at com.hazelcast.spi.impl.AbstractInvocationFuture.whenComplete(AbstractInvocationFuture.java:229)
    	at com.hazelcast.spi.impl.AbstractInvocationFuture.whenComplete(AbstractInvocationFuture.java:63)
    	at com.hazelcast.spi.impl.DelegatingCompletableFuture.<init>(DelegatingCompletableFuture.java:102)
    	at com.hazelcast.spi.impl.DelegatingCompletableFuture.<init>(DelegatingCompletableFuture.java:91)
    	at com.hazelcast.spi.impl.DelegatingCompletableFuture.<init>(DelegatingCompletableFuture.java:85)
    	at com.hazelcast.executor.impl.CancellableDelegatingFuture.<init>(CancellableDelegatingFuture.java:52)
    	at com.hazelcast.executor.impl.ExecutorServiceProxy.submitToMember(ExecutorServiceProxy.java:300)
    	at com.hazelcast.executor.impl.ExecutorServiceProxy.submitToMember(ExecutorServiceProxy.java:288)
    	at io.onedev.server.ee.clustering.DefaultClusterManager.submitToServer(DefaultClusterManager.java:309)
    	at io.onedev.server.ee.clustering.DefaultClusterManager.runOnServer(DefaultClusterManager.java:295)
    	at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50)
    	at com.sun.proxy.$Proxy31.runOnServer(Unknown Source)
    	at io.onedev.server.manager.impl.DefaultProjectManager.runOnActiveServer(DefaultProjectManager.java:1220)
    	at jdk.internal.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50)
    	at com.sun.proxy.$Proxy29.runOnActiveServer(Unknown Source)
    	at io.onedev.server.attachment.DefaultAttachmentManager$2.run(DefaultAttachmentManager.java:363)
    	at io.onedev.server.persistence.DefaultTransactionManager$1.afterCompletion(DefaultTransactionManager.java:129)
    	at org.hibernate.resource.transaction.internal.SynchronizationRegistryStandardImpl.notifySynchronizationsAfterTransactionCompletion(SynchronizationRegistryStandardImpl.java:84)
    	... 15 more
    
Robin Shen commented 7 months ago

Thanks for posting the stack trace. Found an issue that excessive threads got created when importing many issues.

Due to this, the import may still fail even if you add more memory. Please hold on this until the issue is fixed.

wojtek commented 7 months ago

Thanks for posting the stack trace. Found an issue that excessive threads got created when importing many issues.

Hmm... that explains a lot :-)

Due to this, the import may still fail even if you add more memory. Please hold on this until the issue is fixed.

Undestood.

Robin Shen commented 7 months ago

Please upgrade to 9.2.1 which calculates max memory percent based on container memory, and the percentage can also be controlled via helm setting onedev.jvm.maxMemoryPercent

This version also commits imported issues per project to save memory.

Please give it a try and let me know the result.

wojtek commented 7 months ago

Upgraded to 9.2.1.

Wrapper settings:

wrapper.java.additional.100=-XX:MaxRAMPercentage=%max_memory_percent%

set.default.max_memory_percent=50

Yet -Xmx seems to be calculated incorrectly: -Xmx3888m:

root          27  0.1  3.3 6698764 265228 ?      Sl   06:15   0:22 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.awt.headless=true --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.desktop/java.awt.font=ALL-UNNAMED --add-modules=java.se --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/sun.nio.fs=ALL-UNNAMED -Xmx3888m -Djava.library.path=. -classpath ch.qos.logback.logback-classic-1.2.11.jar:ch.qos.logback.logback-core-1.2.11.jar:io.onedev.commons-bootstrap-2.7.22.jar:org.slf4j.jul-to-slf4j-1.7.36.jar:org.slf4j.log4j-over-slf4j-1.7.36.jar:org.slf4j.slf4j-api-1.7.36.jar:wrapper.jar -Dwrapper.key=1pvwRYpkZG_oqPicFU9pzk8bjKkTQUpB -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=9 -Dwrapper.version=3.5.51-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.cpu.timeout=3600 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang org.tanukisoftware.wrapper.WrapperSimpleApp io.onedev.commons.bootstrap.Bootstrap upgrade /opt/onedev

Btw. in this version wrapper doesn't seem to correctly handle the situation as the JVM is not responding but it's still working:

root@onedev-785cf7b97-ndstf:/# lsof -iTCP -sTCP:LISTEN -Pn
COMMAND PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java     27 root    4u  IPv4 10565134      0t0  TCP 127.0.0.1:32000 (LISTEN)

Logs

WrapperManager: Initializing...
03:15:53 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
03:15:53 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
03:15:55 INFO  io.onedev.commons.loader.AppLoader - Starting application...
03:16:31 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
03:16:31 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
03:16:33 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
03:16:34 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
03:16:35 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
03:16:35 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
03:16:42 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM exited unexpectedly.
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
05:37:10 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
05:37:10 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
05:37:12 INFO  io.onedev.commons.loader.AppLoader - Starting application...
05:37:49 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
05:37:50 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
05:37:51 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
05:37:52 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
05:37:52 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
05:37:53 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
05:37:59 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
JVM received a signal SIGKILL (9).
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM exited unexpectedly.
Reloading Wrapper configuration...
Launching a JVM...
WrapperManager: Initializing...
06:11:14 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
06:11:14 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
06:11:16 INFO  io.onedev.commons.loader.AppLoader - Starting application...
06:11:52 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
06:11:53 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
06:11:54 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
06:11:55 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
06:11:55 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
06:11:56 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
06:12:02 INFO  io.onedev.server.OneDev - Server is ready at http://10.43.173.214
TERM trapped.  Shutting down.
06:15:15 INFO  io.onedev.commons.loader.AppLoader - Stopping application...
<-- Wrapper Stopped
Robin Shen commented 7 months ago

Seems that you've changed conf/wrapper.conf manually and OneDev is unable to delete the wrapper.java.maxmemory.percent entry. Please delete it manually and restart the container again.

wojtek commented 7 months ago

Indeed there was such line, I commented it out and manually set -XX:MaxRAMPercentage but

wrapper.java.additional.100=-XX:MaxRAMPercentage=50

#set.default.max_memory_percent=50

but I don't see it applied.

Robin Shen commented 7 months ago

What is the java process command line being displayed now?

wojtek commented 7 months ago

What is the java process command line being displayed now?

root         565  0.0  0.4 6649756 33016 ?       Sl   11:05   0:00 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.awt.headless=true --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.desktop/java.awt.font=ALL-UNNAMED --add-modules=java.se --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/sun.nio.fs=ALL-UNNAMED -Xmx3888m -Djava.library.path=. -classpath ch.qos.logback.logback-classic-1.2.11.jar:ch.qos.logback.logback-core-1.2.11.jar:io.onedev.commons-bootstrap-2.7.22.jar:org.slf4j.jul-to-slf4j-1.7.36.jar:org.slf4j.log4j-over-slf4j-1.7.36.jar:org.slf4j.slf4j-api-1.7.36.jar:wrapper.jar -Dwrapper.key=FFxmPvu_TVu_-VFb-VfACaIAB0mp-rtD -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=9 -Dwrapper.version=3.5.51-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.cpu.timeout=3600 -Dwrapper.jvmid=2 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang org.tanukisoftware.wrapper.WrapperSimpleApp io.onedev.commons.bootstrap.Bootstrap upgrade /opt/onedev
root         581  0.0  0.0   6468  2380 pts/0    S+   11:05   0:00 grep --color=auto java

which is odd - i would assume that XX:MaxRAMPercentage should be there...

Robin Shen commented 7 months ago

Are you using the official image and helm chart? I tested with GKE and it works fine either with new installation or upgrade from 9.2.0

Robin Shen commented 7 months ago

Please also use the offical wrapper.conf to see if it helps.

wrapper.conf

wojtek commented 7 months ago

Are you using the official image and helm chart? I tested with GKE and it works fine either with new installation or upgrade from 9.2.0

We use custom image (based on official one with additional jar file included):

FROM 1dev/server:9.2.1
ADD plugin.jar /app/lib/plugin.jar

and custom helm (we don't set max_memory_percent env var yet). Though, I added manually env-var:

$ kubectl -n onedev describe pod/onedev-658dcf4589-2sj8d | grep max_memory_percent
      max_memory_percent:                 50

Please also use the offical wrapper.conf to see if it helps.

wrapper.conf

With default wrapper:

root@onedev-658dcf4589-2sj8d:/# cat /opt/onedev/conf/wrapper.conf | grep -i -A5 -B20 mem
wrapper.java.library.path.append_system_path=TRUE

# Java Bits.  On applicable platforms, tells the JVM to run in 32 or 64-bit mode.
wrapper.java.additional.auto_bits=TRUE

# Java Additional Parameters
wrapper.java.additional.1=-Djava.awt.headless=true

wrapper.java.additional.10=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
wrapper.java.additional.11=--add-opens=java.base/java.lang=ALL-UNNAMED
wrapper.java.additional.12=--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
wrapper.java.additional.13=--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
wrapper.java.additional.14=--add-opens=java.base/java.util=ALL-UNNAMED
wrapper.java.additional.15=--add-opens=java.base/java.text=ALL-UNNAMED
wrapper.java.additional.16=--add-opens=java.desktop/java.awt.font=ALL-UNNAMED
wrapper.java.additional.30=--add-modules=java.se
wrapper.java.additional.31=--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
wrapper.java.additional.32=--add-opens=java.management/sun.management=ALL-UNNAMED
wrapper.java.additional.33=--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
wrapper.java.additional.50=--add-opens=java.base/sun.nio.fs=ALL-UNNAMED
wrapper.java.additional.100=-XX:MaxRAMPercentage=%max_memory_percent%

set.default.max_memory_percent=50

# Application parameters.  Add parameters as needed starting from 1
wrapper.app.parameter.1=io.onedev.commons.bootstrap.Bootstrap

#********************************************************************

Still it set's Xmx to -Xmx3888:

root@onedev-658dcf4589-2sj8d:/# ps aux | grep java
root          26  8.9  3.6 6698628 286960 ?      Sl   17:07   0:08 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.awt.headless=true --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.desktop/java.awt.font=ALL-UNNAMED --add-modules=java.se --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/sun.nio.fs=ALL-UNNAMED -Xmx3888m -Djava.library.path=. -classpath ch.qos.logback.logback-classic-1.2.11.jar:ch.qos.logback.logback-core-1.2.11.jar:io.onedev.commons-bootstrap-2.7.22.jar:org.slf4j.jul-to-slf4j-1.7.36.jar:org.slf4j.log4j-over-slf4j-1.7.36.jar:org.slf4j.slf4j-api-1.7.36.jar:wrapper.jar -Dwrapper.key=D6u-aaj5SspW7o6kRJd2URMvLAcURqh5 -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=8 -Dwrapper.version=3.5.51-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.cpu.timeout=3600 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang org.tanukisoftware.wrapper.WrapperSimpleApp io.onedev.commons.bootstrap.Bootstrap upgrade /opt/onedev
wojtek commented 7 months ago

OK, I think that I figure it out... the issue is that there are two directories - /app/conf and /opt/onedev/conf - I was editing wrapper.conf in the latter but, as ps aux revealed, the image uses /app/conf/wrapper.conf (not confusing at all!). After recreating the pod it seems to have correct configuration:

root@onedev-658dcf4589-p5v9b:/# ps aux | grep java
root          82 95.6  3.6 4793456 291652 ?      Sl   17:19   0:11 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.awt.headless=true --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.desktop/java.awt.font=ALL-UNNAMED --add-modules=java.se --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/sun.nio.fs=ALL-UNNAMED -XX:MaxRAMPercentage=50 -Djava.library.path=. -classpath ch.qos.logback.logback-classic-1.2.11.jar:ch.qos.logback.logback-core-1.2.11.jar:io.onedev.commons-bootstrap-2.7.24.jar:org.slf4j.jul-to-slf4j-1.7.36.jar:org.slf4j.log4j-over-slf4j-1.7.36.jar:org.slf4j.slf4j-api-1.7.36.jar:wrapper.jar -Dwrapper.key=HjDtMgPEC7nNcptdebG_9rtXmBcw7Vlg -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=64 -Dwrapper.version=3.5.51-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.cpu.timeout=3600 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang org.tanukisoftware.wrapper.WrapperSimpleApp io.onedev.commons.bootstrap.Bootstrap
wojtek commented 7 months ago

Going back to the original issue (importing): about 5 minutes into the import (usually took ~45 minutes till crash) the import spilled out exception:

Importing issues...
Imported 50 issues
Imported 100 issues
Imported 150 issues
Imported 200 issues
Imported 250 issues
Imported 300 issues
Imported 350 issues
Imported 400 issues
Imported 450 issues
Imported 500 issues
Imported 550 issues
Imported 600 issues
Imported 650 issues
Imported 700 issues
Imported 727 issues
org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
    	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
    	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
    	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
    	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
    	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
    	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
    	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:273)
    	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:281)
    	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
    	at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
    	at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:471)
    	at io.onedev.server.persistence.DefaultTransactionManager.lambda$call$0(DefaultTransactionManager.java:66)
    	at io.onedev.server.persistence.DefaultSessionManager.call(DefaultSessionManager.java:90)
    	at io.onedev.server.persistence.DefaultTransactionManager.call(DefaultTransactionManager.java:60)
    	at io.onedev.server.persistence.DefaultTransactionManager.run(DefaultTransactionManager.java:94)
    	at io.onedev.server.plugin.imports.youtrack.ImportServer.importProjects(ImportServer.java:1039)
    	at io.onedev.server.plugin.imports.youtrack.YouTrackProjectImporter.doImport(YouTrackProjectImporter.java:78)
    	at io.onedev.server.web.page.project.imports.ProjectImportPage$1$1.runTask(ProjectImportPage.java:72)
    	at io.onedev.server.web.component.taskbutton.TaskButton$2.call(TaskButton.java:160)
    	at io.onedev.server.web.component.taskbutton.TaskButton$2.call(TaskButton.java:133)
    	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    	at io.onedev.server.security.SecurityUtils.lambda$inheritSubject$1(SecurityUtils.java:350)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30129ms.
    	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:583)
    	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
    	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
    	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85)
    	at org.hibernate.hikaricp.internal.HikariCPConnectionProvider.getConnection(HikariCPConnectionProvider.java:77)
    	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
    	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
    	... 20 more

Checking 1dev logs there is again OOM - which surprised me as I watched /~administration/server-information and the memory hoovered around ~200M (heap max is ~2G):

$ kubectl -n onedev logs -f pod/onedev-658dcf4589-p5v9b
Defaulted container "onedev" out of: onedev, init (init), database-create (init)
berCallableTaskOperation{serviceName='hz:impl:executorService', identityHash=1433640335, partitionId=-1, replicaIndex=0, callId=-73317, invocationTime=1697738232758 (2023-10-19 17:57:12.758), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0, name=default}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1697738232758, firstInvocationTime='2023-10-19 17:57:12.758', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.42.220.186]:5710, pendingResponse={VOID}, backupsAcksExpected=-1, backupsAcksReceived=0, connection=null}, current-response: null
[2272.494s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.sendResponse(Invocation.java:223)
	at com.hazelcast.spi.impl.operationservice.Offload$OffloadedOperationResponseHandler.sendResponse(Offload.java:169)
	at com.hazelcast.spi.impl.operationservice.Operation.sendResponse(Operation.java:485)
	at com.hazelcast.executor.impl.DistributedExecutorService$Processor.sendResponse(DistributedExecutorService.java:305)
	at com.hazelcast.executor.impl.DistributedExecutorService$Processor.run(DistributedExecutorService.java:288)
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
[2272.495s][warning][os,thread] Failed to start the native thread for java.lang.Thread "hz.10.42.220.186:5710.cached.thread-5680"
17:59:07 ERROR c.h.s.i.e.ExecutionService - [10.42.220.186]:5710 [2daa83f6d5410138e0950aa45e228e0d89e6ce8fa2ee7d42adfde37e3bf45dfa] [5.1.7] Failed to execute java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@44ddf73[Completed exceptionally: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached]
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
	at java.base/java.lang.Thread.start0(Native Method)
	at java.base/java.lang.Thread.start(Thread.java:798)
	at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
	at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1354)
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate.addNewWorkerIfRequired(CachedExecutorServiceDelegate.java:140)
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate.execute(CachedExecutorServiceDelegate.java:110)
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate.submit(CachedExecutorServiceDelegate.java:123)
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate.submit(CachedExecutorServiceDelegate.java:129)
	at com.hazelcast.spi.impl.executionservice.impl.ExecutionServiceImpl.submit(ExecutionServiceImpl.java:255)
	at com.hazelcast.internal.server.tcp.TcpServerContext.submitAsync(TcpServerContext.java:277)
	at com.hazelcast.internal.server.tcp.TcpServerConnector.asyncConnect(TcpServerConnector.java:78)
	at com.hazelcast.internal.server.tcp.TcpServerConnectionManager.lambda$getOrConnect$1(TcpServerConnectionManager.java:160)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at com.hazelcast.internal.server.tcp.TcpServerConnectionManagerBase$Plane.addConnectionInProgressIfAbsent(TcpServerConnectionManagerBase.java:206)

You said that you reworked threading - how many thread do you try to create? Usually threadstack needs ~1M (by default).

I was pondering grabbing heapdump during the crash, but it's not related to heap thus it wouldn't help...

Robin Shen commented 7 months ago

I will set up a youtrack instance with 100k issues and import to see what happens. Previously I was testing with small dataset...

Robin Shen commented 7 months ago

Please upgrade to 9.2.2 which removed more threads during import. I tested importing 100k issues with heap mem never reached 1G.

wojtek commented 6 months ago

Upgraded, imported everything and was greeted with "Projects imported successfully" after ~15 minutes. Awesome! 👍

wojtek changed state to 'Closed' 6 months ago
Previous Value Current Value
Open
Closed
issue 1 of 1
Type
Question
Priority
Normal
Assignee
Labels
No labels
Issue Votes (0)
Watchers (3)
Reference
onedev/server#1596
Please wait...
Page is in error, reload to recover