stuck in updating (OD-584)
bauk opened 4 years ago

when I was update my onedev from 6.3.7 to 6.3.8 with helm, it's stucking in updating program.I use ps -ef in container and see this 'java -cp ../boot/* -XX:MaxRAMPercentage=50.0 io.onedev.commons.bootstrap.Bootstrap upgrade /opt/onedev'.

  • Robin Shen commented 4 years ago

    Please show me the console log of main container of onedev pod

  • bauk commented 4 years ago

    image.png

  • Robin Shen commented 4 years ago

    Please follow this guide to get thread dump of the Java process to see where it is stucking at:

    https://www.baeldung.com/java-thread-dump

  • bauk commented 4 years ago

    image_2.png

  • Robin Shen commented 4 years ago

    Are there any other process running in the container besides this one?

  • bauk commented 4 years ago

    No. only this one. I was tried to rollback with helm,I'm sorry I hava new problem,here is my log of the onedev pod. image_3.png

  • Robin Shen commented 4 years ago

    Please follow this guide to enter into maintenance mode: https://code.onedev.io/projects/162/blob/main/pages/maintenance-mode.md

    Then run /opt/onedev/bin/restore-db.sh /path/to/db-backup.zip to see if you can restore the database.

  • bauk commented 4 years ago

    image_4.png

  • Robin Shen commented 4 years ago

    Please copy all files from /app/boot to /opt/onedev/boot and try again

  • bauk commented 4 years ago

    image_5.png When I do this copying.I see this.But I check all files in /opt/onedev/boot ,the files is copied successfully. Then I leave out the maintenance mode.Howerver it's stucking again. LIke this: image_6.png image_7.png

  • Robin Shen commented 4 years ago

    When I do this copying.I see this

    Anything odd here? From the screenshot I see nothing but the copy command input by you. Is this manual copy command also hangs?

    Please switch to maintenance mode again, , copy boot files, and then run restore command against the first backup file as suggested in previous comment.

    If restore successfully, exit maintenance mode by specifying version as your original version to avoid the upgrade.

  • bauk commented 4 years ago

    The copy command doesn't hang but restore failed. image_8.png

  • Robin Shen commented 4 years ago

    Please take a backup of /opt/onedev/site and then copy all files from /app to /opt/onedev manually. Then run same restore again.

  • bauk commented 4 years ago

    I backup /opt/onedev/site, and uninstall onedev with helm, delete pvc I create. Then I install new version with helm again, restore the backup data.all is done.Thank you very much.

  • Robin Shen changed state to 'Closed' 4 years ago
    Previous Value Current Value
    Open
    Closed
  • Robin Shen commented 4 years ago

    No problem. Seems like a PVC issue preventing files to be deleted for some reason.

  • bauk commented 4 years ago

    image_9.png

  • Robin Shen commented 4 years ago

    Please check all files under /opt/onedev including site to make sure they all owned by root and readonly flag is not set.

  • Robin Shen commented 4 years ago

    Also make sure your PV is not shared with other nodes.

  • bauk commented 4 years ago

    I use nfs-client-provisioner,here is my pvc status image_10.png

  • Robin Shen commented 4 years ago

    Is this error starts with below message?

    Can't acquire environment lock after ...

  • bauk commented 4 years ago

    yes

  • Robin Shen commented 4 years ago

    Please check if process with id 29 is running. If yes, is it the same process as OneDev?

  • Robin Shen commented 4 years ago

    Also please enter into maintenance mode to delete all xd.lck recursively under the site directory to see if it works.

  • bauk commented 4 years ago

    Thank you again, that I used new nfs-client to reinstall my onedev. Seems like nothing wrong, and when I get the same problem, I will try to delete all xd.lck.

  • bauk commented 4 years ago

    I found the reason,that I did too much buildwork with nfs, it hanged. May use systemctl restart rpcbind.socket rpcbind.service to resolve this.

  • Robin Shen commented 4 years ago

    Thanks for the update. Definitely helps in case others have same issue.

issue 1/1
Type
Question
Priority
Normal
Assignee
Issue Votes (0)
Watchers (4)
Reference
OD-584
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover