Deployment inaccessible (`HazelcastInstanceNotActiveException: Hazelcast instance is not active!`) (OD-2505)
wojtek opened 5 months ago

We were having issues with accessing 1dev for a bit now, the logs were filled with HazelcastInstanceNotActiveException: Hazelcast instance is not active! errors:

2025-07-27 03:33:56,923 WARN  [qtp945353728-38007] org.eclipse.jetty.server.session
com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
        at com.hazelcast.spi.impl.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:119)
        at com.hazelcast.spi.impl.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:114)
        at com.hazelcast.spi.impl.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:108)
        at com.hazelcast.spi.impl.AbstractDistributedObject.toData(AbstractDistributedObject.java:82)
        at com.hazelcast.map.impl.proxy.MapProxyImpl.set(MapProxyImpl.java:251)
        at com.hazelcast.map.impl.proxy.MapProxyImpl.set(MapProxyImpl.java:242)
        at io.onedev.server.jetty.ClusterSessionDataStore.doStore(ClusterSessionDataStore.java:79)
        at org.eclipse.jetty.server.session.AbstractSessionDataStore$1.run(AbstractSessionDataStore.java:142)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1507)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1544)
        at org.eclipse.jetty.server.session.SessionContext.run(SessionContext.java:92)
        at org.eclipse.jetty.server.session.AbstractSessionDataStore.store(AbstractSessionDataStore.java:155)
        at org.eclipse.jetty.server.session.AbstractSessionCache.release(AbstractSessionCache.java:581)
        at org.eclipse.jetty.server.session.SessionHandler.complete(SessionHandler.java:369)
        at org.eclipse.jetty.server.Request.lambda$leaveSession$0(Request.java:408)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1525)
        at org.eclipse.jetty.server.Request.leaveSession(Request.java:408)
        at org.eclipse.jetty.server.Request.onCompleted(Request.java:1552)
        at org.eclipse.jetty.server.HttpChannel.onCompleted(HttpChannel.java:917)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:467)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:829)

Restarting the pod doesn't do anything as wrapper reports correct startup (it seems):

🕙 [19:51:41] ❯ kubectl -n onedev logs --timestamps=true deployments/onedev -f --since=30m
Defaulted container "onedev" out of: onedev, init (init), database-create (init)
2025-07-27T17:46:25.788826710Z --> Wrapper Started as Console
2025-07-27T17:46:25.788873501Z Java Service Wrapper Standard Edition 64-bit 3.5.51
2025-07-27T17:46:25.788877541Z   Copyright (C) 1999-2022 Tanuki Software, Ltd. All Rights Reserved.
2025-07-27T17:46:25.788880461Z     http://wrapper.tanukisoftware.com
2025-07-27T17:46:25.788882501Z   Licensed to OneDev for Service Wrapping
2025-07-27T17:46:25.788884382Z
2025-07-27T17:46:25.788886902Z Launching a JVM...
2025-07-27T17:46:25.788888732Z WrapperManager: Initializing...
2025-07-27T17:46:25.788890772Z INFO  - Launching application from '/app'...
2025-07-27T17:46:25.788892792Z INFO  - Starting application...
2025-07-27T17:46:25.788894732Z INFO  - Successfully checked /opt/onedev
2025-07-27T17:46:25.788896892Z INFO  - Stopping application...
2025-07-27T17:46:25.788898812Z <-- Wrapper Stopped
2025-07-27T17:46:26.442887196Z STATUS | wrapper  | 2025/07/27 17:46:25.818 | --> Wrapper Started as Console
2025-07-27T17:46:26.442929458Z STATUS | wrapper  | 2025/07/27 17:46:25.820 | Java Service Wrapper Standard Edition 64-bit 3.5.51
2025-07-27T17:46:26.442935658Z STATUS | wrapper  | 2025/07/27 17:46:25.820 |   Copyright (C) 1999-2022 Tanuki Software, Ltd. All Rights Reserved.
2025-07-27T17:46:26.442939428Z STATUS | wrapper  | 2025/07/27 17:46:25.820 |     http://wrapper.tanukisoftware.com
2025-07-27T17:46:26.442945279Z STATUS | wrapper  | 2025/07/27 17:46:25.820 |   Licensed to OneDev for Service Wrapping
2025-07-27T17:46:26.442948279Z STATUS | wrapper  | 2025/07/27 17:46:25.821 |
2025-07-27T17:46:26.442951149Z STATUS | wrapper  | 2025/07/27 17:46:26.024 | Launching a JVM...
2025-07-27T17:46:26.442954129Z INFO   | wrapper  | 2025/07/27 17:46:26.024 | Java Command Line:
2025-07-27T17:46:26.442957219Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[0] : /usr/lib/jvm/java-11-openjdk-amd64/bin/java
2025-07-27T17:46:26.442960289Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[1] : -Djava.awt.headless=true
2025-07-27T17:46:26.442963569Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[2] : --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
2025-07-27T17:46:26.442966149Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[3] : --add-opens=java.base/java.lang=ALL-UNNAMED
2025-07-27T17:46:26.442969109Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[4] : --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
2025-07-27T17:46:26.442972149Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[5] : --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
2025-07-27T17:46:26.442974980Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[6] : --add-opens=java.base/java.util=ALL-UNNAMED
2025-07-27T17:46:26.442977520Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[7] : --add-opens=java.base/java.text=ALL-UNNAMED
2025-07-27T17:46:26.442980300Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[8] : --add-opens=java.desktop/java.awt.font=ALL-UNNAMED
2025-07-27T17:46:26.442982800Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[9] : --add-modules=java.se
2025-07-27T17:46:26.442985160Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[10] : --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
2025-07-27T17:46:26.442987610Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[11] : --add-opens=java.management/sun.management=ALL-UNNAMED
2025-07-27T17:46:26.442990200Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[12] : --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
2025-07-27T17:46:26.442992730Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[13] : --add-opens=java.base/sun.nio.fs=ALL-UNNAMED
2025-07-27T17:46:26.442995200Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[14] : -XX:MaxRAMPercentage=50
2025-07-27T17:46:26.443018811Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[15] : -Djava.library.path=.
2025-07-27T17:46:26.443021711Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[16] : -classpath
2025-07-27T17:46:26.443027211Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[17] : ch.qos.logback.logback-classic-1.4.14.jar:ch.qos.logback.logback-core-1.4.14.jar:io.onedev.commons-bootstrap-3.0.11.jar:org.slf4j.jul-to-slf4j-2.0.9.jar:org.slf4j.log4j-over-slf4j-2.0.9.jar:org.slf4j.slf4j-api-2.0.9.jar:wrapper.jar
2025-07-27T17:46:26.443031032Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[18] : -Dwrapper.key=4W6H7z85K16MJ88lJQhbgvu5Gbn3tKvM
2025-07-27T17:46:26.443033742Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[19] : -Dwrapper.port=32000
2025-07-27T17:46:26.443036142Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[20] : -Dwrapper.jvm.port.min=31000
2025-07-27T17:46:26.443038702Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[21] : -Dwrapper.jvm.port.max=31999
2025-07-27T17:46:26.443041712Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[22] : -Dwrapper.disable_console_input=TRUE
2025-07-27T17:46:26.443045002Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[23] : -Dwrapper.pid=67
2025-07-27T17:46:26.443048482Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[24] : -Dwrapper.version=3.5.51-st
2025-07-27T17:46:26.443051282Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[25] : -Dwrapper.native_library=wrapper
2025-07-27T17:46:26.443054512Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[26] : -Dwrapper.arch=x86
2025-07-27T17:46:26.443057483Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[27] : -Dwrapper.cpu.timeout=3600
2025-07-27T17:46:26.443060013Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[28] : -Dwrapper.jvmid=1
2025-07-27T17:46:26.443062303Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[29] : -Dwrapper.lang.domain=wrapper
2025-07-27T17:46:26.443064613Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[30] : -Dwrapper.lang.folder=../lang
2025-07-27T17:46:26.443066913Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[31] : org.tanukisoftware.wrapper.WrapperSimpleApp
2025-07-27T17:46:26.443069173Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[32] : io.onedev.commons.bootstrap.Bootstrap

And 1dev logs, after restart, just have same hazelcast exception and they sit at bootstarpping:

2025-07-27 03:37:32,783 WARN  [qtp945353728-37526] org.eclipse.jetty.server.session
com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
        at com.hazelcast.spi.impl.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:119)
        at com.hazelcast.spi.impl.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:114)
        at com.hazelcast.spi.impl.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:108)
        at com.hazelcast.spi.impl.AbstractDistributedObject.toData(AbstractDistributedObject.java:82)
        at com.hazelcast.map.impl.proxy.MapProxyImpl.set(MapProxyImpl.java:251)
        at com.hazelcast.map.impl.proxy.MapProxyImpl.set(MapProxyImpl.java:242)
        at io.onedev.server.jetty.ClusterSessionDataStore.doStore(ClusterSessionDataStore.java:79)
        at org.eclipse.jetty.server.session.AbstractSessionDataStore$1.run(AbstractSessionDataStore.java:142)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1507)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1544)
        at org.eclipse.jetty.server.session.SessionContext.run(SessionContext.java:92)
        at org.eclipse.jetty.server.session.AbstractSessionDataStore.store(AbstractSessionDataStore.java:155)
        at org.eclipse.jetty.server.session.AbstractSessionCache.release(AbstractSessionCache.java:581)
        at org.eclipse.jetty.server.session.SessionHandler.complete(SessionHandler.java:369)
        at org.eclipse.jetty.server.Request.lambda$leaveSession$0(Request.java:408)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1525)
        at org.eclipse.jetty.server.Request.leaveSession(Request.java:408)
        at org.eclipse.jetty.server.Request.onCompleted(Request.java:1552)
        at org.eclipse.jetty.server.HttpChannel.onCompleted(HttpChannel.java:917)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:467)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:829)
2025-07-27 17:40:56,295 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Launching application from '/opt/onedev'...
2025-07-27 17:40:56,305 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Cleaning temp directory...
2025-07-27 17:46:27,382 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Launching application from '/opt/onedev'...
2025-07-27 17:46:27,394 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Cleaning temp directory...

We use: 11.9.7 version (custom build); we don't have any cluster nor use hazelcast.

  • wojtek commented 5 months ago

    Hmmm, after about 20 minutes of looking dead 1dev decide it actually gonna "start" but it's still inaccessible (connections time outs):

    1. wrapper:
    2025-07-27T17:46:26.443069173Z INFO   | wrapper  | 2025/07/27 17:46:26.024 |   Command[32] : io.onedev.commons.bootstrap.Bootstrap
    2025-07-27T18:05:36.406651787Z INFO   | jvm 1    | 2025/07/27 17:46:26.442 | WrapperManager: Initializing...
    2025-07-27T18:05:36.406677668Z INFO   | jvm 1    | 2025/07/27 17:46:27.444 | 17:46:27 INFO  i.onedev.commons.bootstrap.Bootstrap - Launching application from '/opt/onedev'...
    2025-07-27T18:05:36.406681488Z INFO   | jvm 1    | 2025/07/27 17:46:27.444 | 17:46:27 INFO  i.onedev.commons.bootstrap.Bootstrap - Cleaning temp directory...
    2025-07-27T18:05:36.406683828Z INFO   | jvm 1    | 2025/07/27 18:03:47.398 | 18:03:47 INFO  io.onedev.commons.loader.AppLoader - Starting application...
    2025-07-27T18:05:36.406686228Z INFO   | jvm 1    | 2025/07/27 18:04:29.687 | 18:04:29 INFO  i.o.s.e.impl.DefaultProjectManager - Checking projects...
    2025-07-27T18:05:36.406689028Z INFO   | jvm 1    | 2025/07/27 18:04:42.205 | 18:04:42 INFO  i.o.s.e.impl.DefaultBuildManager - Caching build info...
    2025-07-27T18:05:36.406692089Z INFO   | jvm 1    | 2025/07/27 18:04:43.806 | 18:04:43 INFO  i.o.s.e.i.DefaultBuildMetricManager - Caching build metric info...
    2025-07-27T18:05:36.406695169Z INFO   | jvm 1    | 2025/07/27 18:04:45.550 | 18:04:45 INFO  i.o.s.e.impl.DefaultIssueManager - Caching issue info...
    2025-07-27T18:05:36.406697939Z INFO   | jvm 1    | 2025/07/27 18:04:46.765 | 18:04:46 INFO  i.o.s.e.i.DefaultAgentAttributeManager - Caching agent attribute info...
    2025-07-27T18:05:36.406700729Z INFO   | jvm 1    | 2025/07/27 18:04:47.768 | 18:04:47 INFO  i.o.s.e.i.DefaultBuildParamManager - Caching build param info...
    2025-07-27T18:05:36.406713049Z INFO   | jvm 1    | 2025/07/27 18:05:35.393 | 18:05:35 ERROR io.onedev.server.agent.ServerSocket - Websocket error (remote address: /10.42.140.222:44116)
    
    1. 1dev log:
    2025-07-27 17:40:56,295 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Launching application from '/opt/onedev'...
    2025-07-27 17:40:56,305 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Cleaning temp directory...
    2025-07-27 17:46:27,382 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Launching application from '/opt/onedev'...
    2025-07-27 17:46:27,394 INFO  [WrapperSimpleAppMain] i.onedev.commons.bootstrap.Bootstrap Cleaning temp directory...
    2025-07-27 18:03:47,354 INFO  [WrapperSimpleAppMain] io.onedev.commons.loader.AppLoader Starting application...
    2025-07-27 18:04:29,668 INFO  [WrapperSimpleAppMain] i.o.s.e.impl.DefaultProjectManager Checking projects...
    2025-07-27 18:04:42,186 INFO  [WrapperSimpleAppMain] i.o.s.e.impl.DefaultBuildManager Caching build info...
    2025-07-27 18:04:43,794 INFO  [WrapperSimpleAppMain] i.o.s.e.i.DefaultBuildMetricManager Caching build metric info...
    2025-07-27 18:04:45,538 INFO  [WrapperSimpleAppMain] i.o.s.e.impl.DefaultIssueManager Caching issue info...
    2025-07-27 18:04:46,701 INFO  [WrapperSimpleAppMain] i.o.s.e.i.DefaultAgentAttributeManager Caching agent attribute info...
    2025-07-27 18:04:47,763 INFO  [WrapperSimpleAppMain] i.o.s.e.i.DefaultBuildParamManager Caching build param info...
    2025-07-27 18:05:35,294 ERROR [Connector-Scheduler-40d94057-1] io.onedev.server.agent.ServerSocket Websocket error (remote address: /10.42.140.222:44116)
    
  • wojtek commented 5 months ago

    After more of time it's possible to open pages, albeit slowly.

    What's the best way to assess the performance issues/what could be impacting the performance?

    What could be added to wrapper/console/server (logback) configuration to give more relevant information that the startup is actually ongoing, and the server would start up eventually?

  • Robin Shen commented 5 months ago

    Hazelcast instance is not active!

    This normally happens when memory is insufficient. Please check memory usage of your OneDev instance.

  • wojtek commented 5 months ago

    Hazelcast instance is not active!

    This normally happens when memory is insufficient. Please check memory usage of your OneDev instance.

    Does it mean it was unloaded? Or what happened to it?

    memory usage is kinda... odd (lots of cache):

    root@onedev-74cb977b49-tv28k:/# free -hw
                   total        used        free      shared     buffers       cache   available
    Mem:           7.6Gi       5.0Gi       305Mi        10Mi       317Mi       2.3Gi       2.6Gi
    Swap:             0B          0B          0B
    

    but it's just a weirdness of running inside k8s (current limit is 2300m, which should be enough). top seems to be reporting 2G of actual memory usage (caveat of how JVM actually allocates the memory and usage on top of HEAP):

    Captura de pantalla 2025-07-28 a las 10.21.31.png

    However - how does it play out in terms of memory usage? I haven't seen any OOMs so why is the reason for the exception? Can we somehow disable hazelcast if we don't use cluster?

    Is it possible to switch 1dev to use slightly more modern JVM? AFAIR JVM 17 improved a bit support for running inside k8s…

    Btw. what about this point:

    What's the best way to assess the performance issues/what could be impacting the performance?

  • Robin Shen commented 5 months ago

    Please check heap memory usage via Administration / System Maintenance / Server Information. Please force a GC and post the screenshot.

  • wojtek commented 5 months ago

    (caveat, now it's snappier)

    Captura de pantalla 2025-07-28 a las 11.01.06.png

    After forced GC:

    Captura de pantalla 2025-07-28 a las 11.01.17.png

    Though I was referring more to ways to investigate it when the instance is not accessible (slowed to a crawl).

    I checked the docs website for any indications (metrics, integration with external monitoring) but nothing showed up (checked "prometheus", "graphana", "metrics")

  • Robin Shen commented 5 months ago

    Your OneDev instance is obviously running low on memory. 1G is too small for your instance (I guess there are quite a few projects, users and issues). Please use a k8s node with at least 8G physical memory.

  • wojtek commented 5 months ago

    Your OneDev instance is obviously running low on memory. 1G is too small for your instance

    Hmm... we do have 2,3G assigned to the pod…

    (I guess there are quite a few projects, users and issues). Please use a k8s node with at least 8G physical memory.

    Erm... does 1dev keeps absolutely everything in memory without any caching? We have:

    • 365 projects
    • 17970 issues

    It's not THAT much…

    I ask again:

    • are there any metrics that we could track and evaluate afterwards (i.e. number of requests spiked, which caused CPU/Mem usage spike)
    • do you expose those?
    • it is possible to disable hazelcast if not needed?
    • are there any way to decrease memory usage (or figure out what's taking it without actually taking the heapdump and analysing it)?
  • Robin Shen commented 5 months ago

    Hmm... we do have 2,3G assigned to the pod…

    OneDev JVM uses 50% of them, and 50% are reserved for OS and git.

    It's not THAT much…

    OneDev caches quite a lot of data including git code analysis result and issue indexes and it may consume quite some memory. Nowadays, a 2G box for your size of deployment is definitely too small (check memory requirement of GitLab or GitHub enterprise).

    Are there any metrics that we could track and evaluate afterwards (i.e. number of requests spiked, which caused CPU/Mem usage spike)

    No. Heap usage/memory dump/thread dump is currently enough to analyze performance/memory issue. I do not want to complicate things too much if current approach works.

    Do you expose those?

    See above

    It is possible to disable hazelcast if not needed?

    No

    Are there any way to decrease memory usage (or figure out what's taking it without actually taking the heapdump and analysing it)?

    OneDev already tries its best to use minimum memory while keeping a decent performance.

  • wojtek commented 5 months ago

    Hmm... we do have 2,3G assigned to the pod…

    OneDev JVM uses 50% of them, and 50% are reserved for OS and git.

    Does git use that much? (apart from the heap vs JVM sizing itself…)

    It's not THAT much…

    OneDev caches quite a lot of data including git code analysis result and issue indexes and it may consume quite some memory. Nowadays, a 2G box for your size of deployment is definitely too small (check memory requirement of GitLab or GitHub enterprise).

    Hmm.. isn't the idea behind caching is to actually cache relevant data that was accessed recently and leave all the rest in the database and load it as requested (on cache-miss)?

    Are there any metrics that we could track and evaluate afterwards (i.e. number of requests spiked, which caused CPU/Mem usage spike)

    No. Heap usage/memory dump/thread dump is currently enough to analyze performance/memory issue. I do not want to complicate things too much if current approach works.

    Well, untill it doesn't. Onedev is inaccessible today so it's not possible to check the Server Information:

    Captura de pantalla 2025-07-29 a las 15.43.16.png

    I'm pondering how to tackle it but threaddump seems very rudimentary and not very helpful to assess performance problem in this case. There is no JMX access to see more details of the JVM (memory usage, possible GC going bonkers, etc).

    Any suggestions?

  • wojtek commented 5 months ago

    Would it be possible to use somewhat newer JVM (17, 21)?

  • wojtek commented 5 months ago

    Still without knowledge what is causing this CPU spikes but a heaupdump from when the things are working fine is quite interesting - it seems that the bulk of memory consumption is due to hybernate and caching library from jetbrains (?!): image.png

  • Robin Shen commented 5 months ago

    Please check incoming references to these hibernate ast nodes to see which code holds these objects.

    The jetbrains exodus memory usage is normal, as OneDev indexes git repositories and stores into exodus key value database to speed up many operations. Exodus hold part of the database in memory for quick access.

    Upgrading to JVM 17/21 of docker container is not an option currently, as this is a big change. Also I do not think it makes big difference.

    For your deployment size, I want to emphasize again that memory allocated to OneDev pod is too small. As you mentioned, you have 2~3G allocated to the pod, and JVM will use 50% of them. You may adjust the portion via helm value onedev.jvm.maxMemoryPercent. But remember that OneDev calls native git for a lot of operations (pull/push/sync etc). And git process can consume quite a lot of memory for large repositories.

    I can not guarantee that allocating more memory will solve your current problem, but current memory assignment to OneDev JVM is definitely too small for your instance, and this should be considered the first step before trying others.

  • Robin Shen commented 5 months ago

    For your reference, JVM of this instance is allocated 4G mem (8G physical memory, and OneDev uses 50%):

    2025-07-30_11-04-01.png

  • wojtek commented 5 months ago

    Please check incoming references to these hibernate ast nodes to see which code holds these objects.

    Hmm...

    Captura de pantalla 2025-07-30 a las 13.27.10.png

    Does this help?

    Btw. I just noticed, that you are using rather old Hibernate (5.4.24):

    ~/dev/tmps/onedev/onedev-latest
    🕙 [13:27:30] ❯ ls -lah lib/ | grep hibernate
    -rw-r--r--@   1 wojtek  staff   955K Jul  4  2024 com.hazelcast.hazelcast-hibernate53-2.2.1.jar
    -rw-r--r--@   1 wojtek  staff    77K Jul  4  2024 org.hibernate.common.hibernate-commons-annotations-5.1.2.Final.jar
    -rw-r--r--@   1 wojtek  staff   7.0M Jul  4  2024 org.hibernate.hibernate-core-5.4.24.Final.jar
    -rw-r--r--@   1 wojtek  staff   591B Jul  4  2024 org.hibernate.hibernate-entitymanager-5.4.24.Final.jar
    -rw-r--r--@   1 wojtek  staff   6.7K Jul  4  2024 org.hibernate.hibernate-hikaricp-5.4.24.Final.jar
    -rw-r--r--@   1 wojtek  staff    13K Jul  4  2024 org.hibernate.hibernate-jcache-5.4.24.Final.jar
    -rw-r--r--@   1 wojtek  staff   1.3M Jul  4  2024 org.hibernate.validator.hibernate-validator-6.2.5.Final.jar
    

    Which was release ~5 years ago (https://mvnrepository.com/artifact/org.hibernate/hibernate-core/5.4.24.Final) and the major version of that line - 5.4.0 - 7 years ago (https://mvnrepository.com/artifact/org.hibernate/hibernate-core/5.4.0.Final).

    From what I gathered and researched, Hibernate 6.0 (already 3 years old; https://mvnrepository.com/artifact/org.hibernate/hibernate-core/6.0.0.Final) significantly improved memory usage as they reworked whole AST model and switched from ANTLR2 to ANTLR4 -- maybe an upgrade is in order? And could improve the situation?

    The jetbrains exodus memory usage is normal, as OneDev indexes git repositories and stores into exodus key value database to speed up many operations. Exodus hold part of the database in memory for quick access.

    Yeah, it looks OK. Even with second, bigger heapdump (~1,6G) exodus cache objects keep rather sensible. It looks like the Hibernate is being a misbehaving cow… :)

    Upgrading to JVM 17/21 of docker container is not an option currently, as this is a big change.

    Why though. I started to experiment a bit on my local machine and 1dev seems to work an run perfectly fine even with Java24. Why would the change be big? If you provide a docker image, for the end user it's all the same which Java version it runs considering it would run fine.

    Looking at the Dockerfile (I assume it's this one: https://code.onedev.io/onedev/server/~files/main/server-product/docker/Dockerfile.server) you explicitly install openjdk-11 but newer versions on that LTS ubuntu are also available:

    root@onedev-6c7dbf9bc5-d5bmv:/# apt search openjdk | grep -E "openjdk.*headless"
    
    openjdk-11-jdk-headless/noble-security 11.0.28+6-1ubuntu1~24.04.1 amd64
    openjdk-11-jre-headless/noble-security 11.0.28+6-1ubuntu1~24.04.1 amd64 [upgradable from: 11.0.27+6~us1-0ubuntu1~24.04]
    openjdk-17-jdk-headless/noble-updates,noble-security 17.0.15+6~us1-0ubuntu1~24.04 amd64
    openjdk-17-jre-headless/noble-updates,noble-security 17.0.15+6~us1-0ubuntu1~24.04 amd64
    openjdk-21-jdk-headless/noble-updates,noble-security 21.0.8+9~us1-0ubuntu1~24.04.1 amd64
    openjdk-21-jre-headless/noble-updates,noble-security 21.0.8+9~us1-0ubuntu1~24.04.1 amd64
    openjdk-8-jdk-headless/noble-updates,noble-security 8u462-ga~us1-0ubuntu2~24.04.2 amd64
    openjdk-8-jre-headless/noble-updates,noble-security 8u462-ga~us1-0ubuntu2~24.04.2 amd64
    

    So the change should be relatively simple (unless I'm missing something, like library compatibility ;) but again - from local tests 1dev started and ran fine… )

    Also I do not think it makes big difference.

    There were A TON of improvements to the JVM itself, especially in the GC area (which I will get back to in a second). I do recommend checking out Thomas Schatzl's blog (https://tschatzl.github.io/) who works on HotSpot/GC.

    With Java24 there is also new JEP with compactheaders, which could shave off ~20-25% of heap usage.

    To reiterate - newer JVM could help here.

    For your deployment size, I want to emphasize again that memory allocated to OneDev pod is too small. As you mentioned, you have 2~3G allocated to the pod, and JVM will use 50% of them. You may adjust the portion via helm value onedev.jvm.maxMemoryPercent.

    to paraphrase a bit: "I would like to emphasize, that JVM can be quite efficient when done right and throwing more memory at it is not a solution for everything" :) I would also like to quote the documentation (https://docs.onedev.io/installation-guide/run-on-bare-metal):

    "OneDev can run happily on a 2 core 2GB box. For personal use, 1 core 1GB box also works"

    :)

    Our instance is relatively small when it comes to traffic (we have only a handful of active users; though assessing exact usage is a bit difficult due to lack of any metrics); we may have relatively lots of projects/issues but those should be just stored in repository and not loaded and kept in memory at all time and loaded only when needed.

    I did bump Xms to 60% but then one has to consider the rest of the JVM (stacktraces for example, metaspace, direct memory and whatnot) and the rest (mentioned git) so it's a bit more tricky to adjust. (Again, in newer version they are going to improve automatic heap sizing and memory allocation to automatically maximize it, but it's not there yet... but using newer JVM could help take advantage of it sooner :) )

    But remember that OneDev calls native git for a lot of operations (pull/push/sync etc). And git process can consume quite a lot of memory for large repositories.

    Was jgit more inefficient for that?

    I can not guarantee that allocating more memory will solve your current problem, but current memory assignment to OneDev JVM is definitely too small for your instance, and this should be considered the first step before trying others.

    I played a bit with it (accessed via JMX to get some insights) and yes: high memory pressure lead to high GC operation which resulted in high CPU usage and problems with accessing the instance: Captura de pantalla 2025-07-30 a las 11.09.32.png

    Yes, adding more memory could probably work but then where's the limit? IMHO optimizing memory usage should be doable.

    One additional thing I noticed is relatively high count of threads combine with its high variability:

    • by default JVM uses 1M for stacktrace so it easily adds ~400M of memory used here (400 threads); from my tests, 1dev runs fine with a quarter of that (-Xss256k) so right there we have 300M memory gained (though it's non-heap so it's not visible in VisualVM and I didn't get to run 1dev with Native Memory Tracking enabled as it's kinda cumbersome with the wrapper configuration)
    • the high variability suggests that the threads are quite often started/stop - using threadpool/executor would be recommended here to lower the pressure on the OS
  • wojtek commented 5 months ago

    Do you see any harm in enabling hibernate statistics and exposing them over JMX? I.e. adding:

    hibernate.generate_statistics=true
    hibernate.jmx.enabled=true
    hibernate.jmx.usePlatformServer=true
    

    to the hibernate.properties file?

    Btw. while looking at it, and also hibernate documentation (https://docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#caching) i noticed that it's possible to configure different caching provider, which somewhat touches on my previous point of not-using-hazelcast (though originally it was in Jetty thread and this is in the context of ORM/Hibernate) -- what would you say to using Ehcache (2.x), which should give better performance for non-cluster setup?

  • wojtek commented 5 months ago

    Hmm... I just observed a really odd issue - having two JVM/1dev processes on the pod (?!), and the second one was gone soon after:

    image_2.png

  • Robin Shen commented 5 months ago

    Thanks for the investigation.

    12.0.3 added helm settings to control Hibernate query plan cache size and thread stack size.

    For Hibernate upgrade

    Upgrading to new Hibernate version is not scheduled currently as OneDev makes some customizations to Hibernate 5.x.

    Use JVM 17 or higher

    Upgrading default JVM is REALLY a big change requiring intensive testing, not just some adhoc testing. For such upgrade, I normally develop with that JVM daily for some time to make sure everything works. Since Java 11 is still mainstream and some platforms does not support Java 17 or higher, OneDev will stick with it for some time.

    Thread pool

    OneDev does use thread pool, but idle threads will be removed after being idle for 1 minute.

    JMX monitoring

    Watching metrics via JMX should be fine, although I never tested it

    Mixed use of native git/jgit

    Native git is much faster when handling pull/push of large repositories, while jgit is more efficient reading meta data of git repository (branch/tag/commit parsing etc), as these metadata will be cached in memory (another source of memory consumption if you have many repositories)

    Avoid using Hazelcast if not using clustering

    As mentioned earlier, this is not possible as the logic will be very very complex.

    Duplication of process display

    I think this is just a display issue, as process id is the same. This should not be worried about.

    At last, OneDev does work with minimum resource as advertised, for personal use of few repositories. But as repositories and issues (some issue data also cached) increase, it is normal to throw more resource at it, as for every software, especially in a company environment.

  • Robin Shen changed state to 'Closed' 5 months ago
    Previous Value Current Value
    Open
    Closed
  • wojtek commented 5 months ago

    Thanks for the investigation.

    12.0.3 added helm settings to control Hibernate query plan cache size and thread stack size.

    Thank you.

    For Hibernate upgrade

    Upgrading to new Hibernate version is not scheduled currently as OneDev makes some customizations to Hibernate 5.x.

    How come/why? (more out of curiosity) Do you maintain your own fork of Hibernate?

    Use JVM 17 or higher

    Upgrading default JVM is REALLY a big change requiring intensive testing, not just some adhoc testing. For such upgrade, I normally develop with that JVM daily for some time to make sure everything works. Since Java 11 is still mainstream and some platforms does not support Java 17 or higher, OneDev will stick with it for some time.

    Hmm... Java11 (Released in 2018, 7 years ago) official support by Oracle ended in 2019. Red Hat ended in 2024 and Temurin/Azul/Microsoft ones end in 2027 which is sooner than later (vide https://en.wikipedia.org/wiki/Java_version_history#Release_table)

    In Jetbrain survey from 2 years ago (2023, I don't know why they don't provide this data for 2024) Java 17 was already leading over Java 11 (45% to 38%; lot's of people still stuck on ancient Java8): https://www.jetbrains.com/lp/devecosystem-2023/java/

    I we are going by "mainstream" logic then Java8 should be used…

    As for support for platforms - I don't understand this point - newer versions are widely provided.

    And while I agree that upgrading from 8 to 9/11 was huge inconvenience, upgrading to newer versions past 11 is relatively straightforward.

    Thread pool

    OneDev does use thread pool, but idle threads will be removed after being idle for 1 minute.

    Why? AFAIR creating (native) threads is relatively expensive hence the popular suggestion to use fixed pools?

  • Robin Shen commented 5 months ago

    How come/why? (more out of curiosity) Do you maintain your own fork of Hibernate?

    OneDev makes some changes to Hibernate validation logic:

    https://code.onedev.io/onedev/server/~files/main/server-core/src/main/java/org/hibernate

    As for support for platforms - I don't understand this point - newer versions are widely provided.

    For instance, Windows 7 and Ubuntu 18 are no longer supported. And OneDev has a customer running agent on Windows 7.

    Why? AFAIR creating (native) threads is relatively expensive hence the popular suggestion to use fixed pools?

    Because threads consume memory.

  • wojtek commented 4 months ago

    How come/why? (more out of curiosity) Do you maintain your own fork of Hibernate?

    OneDev makes some changes to Hibernate validation logic:

    https://code.onedev.io/onedev/server/~files/main/server-core/src/main/java/org/hibernate

    Have you considered upstreaming the changes (lower maintanence on your part, easier upgrades)?

    As for support for platforms - I don't understand this point - newer versions are widely provided.

    For instance, Windows 7 and Ubuntu 18 are no longer supported. And OneDev has a customer running agent on Windows 7.

    Hmm... that sheds some light on the stance :) though running outdated OS is not really recommended? Wouldn't it be possible to run older agent with newer 1dev or the communication protocol makes it impossible?

  • Robin Shen commented 4 months ago

    Have you considered upstreaming the changes (lower maintanence on your part, easier upgrades)?

    This is very specific to OneDev

    mm... that sheds some light on the stance :) though running outdated OS is not really recommended? Wouldn't it be possible to run older agent with newer 1dev or the communication protocol makes it impossible?

    That makes the maintenance quite complicated. OneDev agent and server shares some common code base, and I load them in same workspace to make tasks such as refactorings easier.

    I hope the customer can upgrade their OS though, but that will not happen this year...

issue 1/1
Type
Question
Priority
Normal
Assignee
Labels
No labels
Issue Votes (0)
Watchers (2)
Reference
OD-2505
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover