#584  stuck in updating
Closed
bauk opened 2 years ago

when I was update my onedev from 6.3.7 to 6.3.8 with helm, it's stucking in updating program.I use ps -ef in container and see this 'java -cp ../boot/* -XX:MaxRAMPercentage=50.0 io.onedev.commons.bootstrap.Bootstrap upgrade /opt/onedev'.

Robin Shen commented 2 years ago

Please show me the console log of main container of onedev pod

bauk commented 2 years ago

image.png

Robin Shen commented 2 years ago

Please follow this guide to get thread dump of the Java process to see where it is stucking at:

https://www.baeldung.com/java-thread-dump

bauk commented 2 years ago

image_2.png

Robin Shen commented 2 years ago

Are there any other process running in the container besides this one?

bauk commented 2 years ago

No. only this one. I was tried to rollback with helm,I'm sorry I hava new problem,here is my log of the onedev pod. image_3.png

Robin Shen commented 2 years ago

Please follow this guide to enter into maintenance mode: https://code.onedev.io/projects/162/blob/main/pages/maintenance-mode.md

Then run /opt/onedev/bin/restore-db.sh /path/to/db-backup.zip to see if you can restore the database.

bauk commented 2 years ago

image_4.png

Robin Shen commented 2 years ago

Please copy all files from /app/boot to /opt/onedev/boot and try again

bauk commented 2 years ago

image_5.png When I do this copying.I see this.But I check all files in /opt/onedev/boot ,the files is copied successfully. Then I leave out the maintenance mode.Howerver it's stucking again. LIke this: image_6.png image_7.png

Robin Shen commented 2 years ago

When I do this copying.I see this

Anything odd here? From the screenshot I see nothing but the copy command input by you. Is this manual copy command also hangs?

Please switch to maintenance mode again, , copy boot files, and then run restore command against the first backup file as suggested in previous comment.

If restore successfully, exit maintenance mode by specifying version as your original version to avoid the upgrade.

bauk commented 2 years ago

The copy command doesn't hang but restore failed. image_8.png

Robin Shen commented 2 years ago

Please take a backup of /opt/onedev/site and then copy all files from /app to /opt/onedev manually. Then run same restore again.

bauk commented 2 years ago

I backup /opt/onedev/site, and uninstall onedev with helm, delete pvc I create. Then I install new version with helm again, restore the backup data.all is done.Thank you very much.

Robin Shen changed state to 'Closed' 2 years ago
Previous Value Current Value
Open
Closed
Robin Shen commented 2 years ago

No problem. Seems like a PVC issue preventing files to be deleted for some reason.

bauk commented 2 years ago

image_9.png

Robin Shen commented 2 years ago

Please check all files under /opt/onedev including site to make sure they all owned by root and readonly flag is not set.

Robin Shen commented 2 years ago

Also make sure your PV is not shared with other nodes.

bauk commented 2 years ago

I use nfs-client-provisioner,here is my pvc status image_10.png

Robin Shen commented 2 years ago

Is this error starts with below message?

Can't acquire environment lock after ...

bauk commented 2 years ago

yes

Robin Shen commented 2 years ago

Please check if process with id 29 is running. If yes, is it the same process as OneDev?

Robin Shen commented 2 years ago

Also please enter into maintenance mode to delete all xd.lck recursively under the site directory to see if it works.

bauk commented 2 years ago

Thank you again, that I used new nfs-client to reinstall my onedev. Seems like nothing wrong, and when I get the same problem, I will try to delete all xd.lck.

bauk commented 2 years ago

I found the reason,that I did too much buildwork with nfs, it hanged. May use systemctl restart rpcbind.socket rpcbind.service to resolve this.

Robin Shen commented 2 years ago

Thanks for the update. Definitely helps in case others have same issue.

issue 1 of 1
Type
Question
Priority
Normal
Assignee
Issue Votes (0)
Watchers (4)
Reference
onedev/server#584
Please wait...
Page is in error, reload to recover