Upgrade failure (OD-998)
Released
Jerome St-Louis opened 1 year ago

Upgrading OneDev failed with errors:

INFO - >>> com.google.inject.CreationException: Unable to create injector, see the following errors: INFO - >>> INFO - >>> 1) [Guice/ErrorInjectingConstructor]: ExplicitException: http_port should be specified

INFO - >>> 2) [Guice/ErrorInjectingConstructor]: ExplicitException: http_port should be specified INFO - >>> at DefaultServerConfig.(DefaultServerConfig.java:64) INFO - >>> at DefaultServerConfig.class(DefaultServerConfig.java:29)

My server doesn't enable http, only https.

The upgrade attempt left the server in an unusable state.

Jerome St-Louis commented 1 year ago

After running /onedev/bin/restore-db.sh /onedev/site/db-backup/2022-11-26_00-52-38.zip The database was working again. Trying to uncomment the http_port to get over that upgrade error, I got new errors:

INFO - >>> ERROR - ERROR: relation "o_role" does not exist INFO - >>> Position: 13 INFO - >>> ERROR - Error booting application INFO - >>> javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute statement

Upgrade failures, especially leaving the system in an unusable state, is a very frustrating experience for users.

Robin Shen commented 1 year ago

Please add http port to conf/server.properties. This is now mandatory. Also the https support is removed from OneDev itself for simplicity reasons. To add https support, please add a reverse proxy (Apache, Nginx or Caddy server) which handles better with certicates.

Robin Shen commented 1 year ago

The database is now in an inconsistent state, please clean it up, and then restore from the backup.

Jerome St-Louis commented 1 year ago

@robin What exactly is meant by "clean it up"? I did try to restore it from both backups, but the server refuses to start.

Robin Shen commented 1 year ago

Delete the database and create a new one, and then restore from backup

Robin Shen commented 1 year ago

Before trying another upgrade, make sure old version can start up.

Jerome St-Louis commented 1 year ago

@robin Despite completely deleting the database, creating a brand new one, running restore DB with the original backup it made, and it saying "INFO - Database is successfully restored from /onedev/site/db-backup/2022-11-26_00-52-38.zip", it still fails to start: "OneDev may have failed to start. The syslog may contain further information.".

I don't see anything at all in /onedev/logs/server.log or /var/log/syslog.log .

Robin Shen commented 1 year ago

Please run bin/server.sh console to see what it is complaining about.

Jerome St-Louis commented 1 year ago

OK maybe it is a problem with the user that I should be running those upgrade / restore scripts etc.with? I normally run onedev as the onedev user, but I ran those scripts as root.

Jerome St-Louis commented 1 year ago

root]:/onedev# /onedev/bin/server.sh console Running OneDev... wrapper | Configuration file not found: /onedev/conf/wrapper.conf wrapper | Current working directory: /onedev/boot wrapper | Failed to load configuration. wrapper | The Wrapper will stop.

-rw------- 1 root root 6925 Oct 30 01:41 /onedev/conf/wrapper.conf

Robin Shen commented 1 year ago

You should run upgrade/restore with user onedev. Otherwise, you will encounter some file permission issue.

Robin Shen commented 1 year ago

Please change all your files under /onedev to be owned by user onedev. Or you should simply run everything with root

Jerome St-Louis commented 1 year ago

After changing everything to onedev it seems to start but then says:

01:40:12 ERROR i.onedev.commons.bootstrap.Bootstrap - Error booting application java.lang.IllegalStateException: basedir /onedev/site/assets/root does not exist.

There is no such directory.

Robin Shen commented 1 year ago

Yes, please create that directory as user onedev.

Jerome St-Louis commented 1 year ago

The server is running again, thank you. I wish the upgrade process was more robust so users trying to upgrade do not have to go through any of this.

I will attempt to upgrade again...

Jerome St-Louis commented 1 year ago

I still get an upgrade error on o_role:

INFO - >>> INFO - Creating tables... INFO - >>> INFO - Importing data into database... INFO - >>> INFO - Importing from data file 'Roles.xml'... INFO - >>> ERROR - ERROR: relation "o_role" does not exist INFO - >>> Position: 13 INFO - >>> ERROR - Error booting application INFO - >>> javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute statement

Robin Shen commented 1 year ago

Which OneDev version are you using, and which database are you using?

Jerome St-Louis commented 1 year ago

Trying to upgrade from 7.7.8 => 7.7.14 and I'm using PostgreSQL 14.5

Robin Shen commented 1 year ago

Will check what is wrong.

Jerome St-Louis commented 1 year ago

Thanks a lot for the help. I am trying to setup the Apache https proxy for the new version in the meantime... That is not exactly easy, I liked the built-in https feature!

Robin Shen commented 1 year ago

You may run a Caddy server with one line to get letsencrypt set up:

https://code.onedev.io/projects/162/files/main/pages/reverse-proxy-setup.md#caddy-server

Jerome St-Louis commented 1 year ago

Thanks, that page is useful. I managed to get my Apache 2 SSL reverse proxy working. I had some experience with Apache though it is quite overly complicated, but here the config files were organized very differently... possibly because of different distro.

Any luck figuring out the o_role issue? Would it be worthwhile trying to upgrade one point release at a time from .8 => .14 ?

UPDATE: Going through them. Upgrading .8 => .9, .9 => .10, .10 => .11 (Yay!! \o/ eC syntax highlighting is working :)) so far worked fine.

The only o_role I see in the codebase are o_role_id occurrences:

server-core/src/main/java/io/onedev/server/model/LinkAuthorization.java:16:63 > indexes={@Index(columnList="o_link_id"), @Index(columnList="o_role_id")}, server-core/src/main/java/io/onedev/server/model/LinkAuthorization.java:17:67 > uniqueConstraints={@UniqueConstraint(columnNames={"o_link_id", "o_role_id"})})

Jerome St-Louis commented 1 year ago

OK it is the .11 => .13 that is broken. (there is no .12 builds available to try an intermediate)

That upgrade also seems to do a lot more stuff than the previous ones.

I guess I will be sticking to 7.7.11 for now... At least I have the working eC syntax highlighting :)

In addition to solving this o_role issue, I would strongly suggest to address the following upgrading issues:

  • Fix this issue of {onedev}/site/assets/root ending up non-existent or automatically create it if missing
  • Fix the message saying "The syslog may contain further information." to "Try running server.sh console for further information"
  • Give an error and abort if the user runs upgrade.sh or restore-db.sh with a different user than the RUN_AS_USER configured in server.sh and/or the owner of those scripts/directories, or include this RUN_AS_USER from all shell files from a centralized configuration file.
  • Either check the config for http_port present and other settings to be valid before attempting the upgrade, or otherwise fix the upgrade not to fail because of this
  • If an upgrade fails, automatically clean & restore the database, or completely avoid scenarios where this can happen.

Failing upgrades that leaves things broken are really the worst user experience, and unfortunately all too common. OneDev has a lot of great things going for it, so I hope that it can excel at upgrades :)

Thank you very much again for your great ultra-rapid support. That is one of the best OneDev feature! :)

Robin Shen commented 1 year ago

Fix this issue of {onedev}/site/assets/root ending up non-existent or automatically create it if missing

This will be a one-time error and should not happen in future upgrades

Fix the message saying "The syslog may contain further information." to "Try running server.sh console for further information"

This message is reported by JSW, a third party service wrapper used by OneDev. And OneDev can not do much here

Give an error and abort if the user runs upgrade.sh or restore-db.sh with a different user than the RUN_AS_USER configured in server.sh and/or the owner of those scripts/directories, or include this RUN_AS_USER from all shell files from a centralized configuration file.

Various shell files are from JSW, the third party service wrapper tool, and OneDev can not do much here.

Either check the config for http_port present and other settings to be valid before attempting the upgrade, or otherwise fix the upgrade not to fail because of this

This is a bug not validating http_port

If an upgrade fails, automatically clean & restore the database, or completely avoid scenarios where this can happen.

In such situation, OneDev can not clean up the database by itself, as data might in an unexpected form.

All-in-all, this is a bug. OneDev long time user should know that the ugprade procedure is normally painless.

Jerome St-Louis commented 1 year ago

Thanks for the feedback.

In such situation, OneDev can not clean up the database by itself, as data might in an unexpected form.

I'm not sure I understand. If the user is supposed to just delete it and re-create it, can't OneDev do that itself since it can create tables etc.,? Is it because there are several different DB drivers back-ends that it would be too complicated to do?

For PostgreSQL, couldn't it use BEGIN / ROLLBACK if things go wrong?

Robin Shen commented 1 year ago

OneDev does not put whole restore/upgrade procedure in a single transaction, as that can consume a lot of memory. Also it has to deal with different types of databases, and the only cross database approach to clean is to delete tables known by OneDev one by one following foreign key constraint order. If database is in normal state, OneDev knows that its tables are consistent with application models, and can deal with that. But in case of a upgrade failure, database tables may get out of order (for instance new tables gets inserted, new foreign keys applied, etc), and OneDev can not delete all of them reliably with its application model.

Jerome St-Louis commented 1 year ago

Thanks for the detailed explanation.

If OneDev could realize during the failed upgrade exactly at what point things went wrong, it could revert the one transaction that failed, then the rest of it would all be consistent and could reliably be deleted and/or reverted.

Sorry to be persistent, but a failed upgrade that leaves a system broken, is just not something that any user should have to deal with, ever, for any reason (especially minor bugs like the ones that came into play here, which are likely to happen again in the future, because we programmers introduce bugs all the time ;)...).

(And I'm just not talking about OneDev, but any software, Operating Systems in particular! The only thing worst than a system breaking from a failed upgrade is devices bricking for no reason at all other than running low battery... Software quality is going downhill. But OneDev is a gem in terms of performance and memory footprint compared to other solutions, so I still have hope it can upgrade flawlessly for everyone for all future releases! ;))

Robin Shen commented 1 year ago

I filed an improvement request investing this again in future versions:

Issue #1000 - Restore database with backup data in case of a failed upgrade

OneDev changed state to 'Closed' 1 year ago
Previous Value Current Value
Open
Closed
OneDev commented 1 year ago

State changed as code fixing the issue is committed

OneDev changed state to 'Released' 1 year ago
Previous Value Current Value
Closed
Released
OneDev commented 1 year ago

State changed as build #3138 is successful

Jerome St-Louis commented 1 year ago

Thank you very much again for the fix and for considering to improve upgrade reliability in future versions!

Robin Shen commented 1 year ago

Glad to be helpful. Thanks for all the feedback helping OneDev getting better.

issue 1 of 1
Type
Bug
Priority
Critical
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (4)
Reference
OD-998
Please wait...
Page is in error, reload to recover