Upgrade failure (OD-998)
Jerome St-Louis opened 3 years ago

Upgrading OneDev failed with errors:

INFO - >>> com.google.inject.CreationException: Unable to create injector, see the following errors: INFO - >>> INFO - >>> 1) [Guice/ErrorInjectingConstructor]: ExplicitException: http_port should be specified

INFO - >>> 2) [Guice/ErrorInjectingConstructor]: ExplicitException: http_port should be specified INFO - >>> at DefaultServerConfig.(DefaultServerConfig.java:64) INFO - >>> at DefaultServerConfig.class(DefaultServerConfig.java:29)

My server doesn't enable http, only https.

The upgrade attempt left the server in an unusable state.

  • Jerome St-Louis commented 3 years ago

    After running /onedev/bin/restore-db.sh /onedev/site/db-backup/2022-11-26_00-52-38.zip The database was working again. Trying to uncomment the http_port to get over that upgrade error, I got new errors:

    INFO - >>> ERROR - ERROR: relation "o_role" does not exist INFO - >>> Position: 13 INFO - >>> ERROR - Error booting application INFO - >>> javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute statement

    Upgrade failures, especially leaving the system in an unusable state, is a very frustrating experience for users.

  • Robin Shen commented 3 years ago

    Please add http port to conf/server.properties. This is now mandatory. Also the https support is removed from OneDev itself for simplicity reasons. To add https support, please add a reverse proxy (Apache, Nginx or Caddy server) which handles better with certicates.

  • Robin Shen commented 3 years ago

    The database is now in an inconsistent state, please clean it up, and then restore from the backup.

  • Jerome St-Louis commented 3 years ago

    @robin What exactly is meant by "clean it up"? I did try to restore it from both backups, but the server refuses to start.

  • Robin Shen commented 3 years ago

    Delete the database and create a new one, and then restore from backup

  • Robin Shen commented 3 years ago

    Before trying another upgrade, make sure old version can start up.

  • Jerome St-Louis commented 3 years ago

    @robin Despite completely deleting the database, creating a brand new one, running restore DB with the original backup it made, and it saying "INFO - Database is successfully restored from /onedev/site/db-backup/2022-11-26_00-52-38.zip", it still fails to start: "OneDev may have failed to start. The syslog may contain further information.".

    I don't see anything at all in /onedev/logs/server.log or /var/log/syslog.log .

  • Robin Shen commented 3 years ago

    Please run bin/server.sh console to see what it is complaining about.

  • Jerome St-Louis commented 3 years ago

    OK maybe it is a problem with the user that I should be running those upgrade / restore scripts etc.with? I normally run onedev as the onedev user, but I ran those scripts as root.

  • Jerome St-Louis commented 3 years ago

    root]:/onedev# /onedev/bin/server.sh console Running OneDev... wrapper | Configuration file not found: /onedev/conf/wrapper.conf wrapper | Current working directory: /onedev/boot wrapper | Failed to load configuration. wrapper | The Wrapper will stop.

    -rw------- 1 root root 6925 Oct 30 01:41 /onedev/conf/wrapper.conf

  • Robin Shen commented 3 years ago

    You should run upgrade/restore with user onedev. Otherwise, you will encounter some file permission issue.

  • Robin Shen commented 3 years ago

    Please change all your files under /onedev to be owned by user onedev. Or you should simply run everything with root

  • Jerome St-Louis commented 3 years ago

    After changing everything to onedev it seems to start but then says:

    01:40:12 ERROR i.onedev.commons.bootstrap.Bootstrap - Error booting application java.lang.IllegalStateException: basedir /onedev/site/assets/root does not exist.

    There is no such directory.

  • Robin Shen commented 3 years ago

    Yes, please create that directory as user onedev.

  • Jerome St-Louis commented 3 years ago

    The server is running again, thank you. I wish the upgrade process was more robust so users trying to upgrade do not have to go through any of this.

    I will attempt to upgrade again...

  • Jerome St-Louis commented 3 years ago

    I still get an upgrade error on o_role:

    INFO - >>> INFO - Creating tables... INFO - >>> INFO - Importing data into database... INFO - >>> INFO - Importing from data file 'Roles.xml'... INFO - >>> ERROR - ERROR: relation "o_role" does not exist INFO - >>> Position: 13 INFO - >>> ERROR - Error booting application INFO - >>> javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute statement

  • Robin Shen commented 3 years ago

    Which OneDev version are you using, and which database are you using?

  • Jerome St-Louis commented 3 years ago

    Trying to upgrade from 7.7.8 => 7.7.14 and I'm using PostgreSQL 14.5

  • Robin Shen commented 3 years ago

    Will check what is wrong.

  • Jerome St-Louis commented 3 years ago

    Thanks a lot for the help. I am trying to setup the Apache https proxy for the new version in the meantime... That is not exactly easy, I liked the built-in https feature!

  • Robin Shen commented 3 years ago

    You may run a Caddy server with one line to get letsencrypt set up:

    https://code.onedev.io/projects/162/files/main/pages/reverse-proxy-setup.md#caddy-server

  • Jerome St-Louis commented 3 years ago

    Thanks, that page is useful. I managed to get my Apache 2 SSL reverse proxy working. I had some experience with Apache though it is quite overly complicated, but here the config files were organized very differently... possibly because of different distro.

    Any luck figuring out the o_role issue? Would it be worthwhile trying to upgrade one point release at a time from .8 => .14 ?

    UPDATE: Going through them. Upgrading .8 => .9, .9 => .10, .10 => .11 (Yay!! \o/ eC syntax highlighting is working :)) so far worked fine.

    The only o_role I see in the codebase are o_role_id occurrences:

    server-core/src/main/java/io/onedev/server/model/LinkAuthorization.java:16:63 > indexes={@Index(columnList="o_link_id"), @Index(columnList="o_role_id")}, server-core/src/main/java/io/onedev/server/model/LinkAuthorization.java:17:67 > uniqueConstraints={@UniqueConstraint(columnNames={"o_link_id", "o_role_id"})})

  • Jerome St-Louis commented 3 years ago

    OK it is the .11 => .13 that is broken. (there is no .12 builds available to try an intermediate)

    That upgrade also seems to do a lot more stuff than the previous ones.

    I guess I will be sticking to 7.7.11 for now... At least I have the working eC syntax highlighting :)

    In addition to solving this o_role issue, I would strongly suggest to address the following upgrading issues:

    • Fix this issue of {onedev}/site/assets/root ending up non-existent or automatically create it if missing
    • Fix the message saying "The syslog may contain further information." to "Try running server.sh console for further information"
    • Give an error and abort if the user runs upgrade.sh or restore-db.sh with a different user than the RUN_AS_USER configured in server.sh and/or the owner of those scripts/directories, or include this RUN_AS_USER from all shell files from a centralized configuration file.
    • Either check the config for http_port present and other settings to be valid before attempting the upgrade, or otherwise fix the upgrade not to fail because of this
    • If an upgrade fails, automatically clean & restore the database, or completely avoid scenarios where this can happen.

    Failing upgrades that leaves things broken are really the worst user experience, and unfortunately all too common. OneDev has a lot of great things going for it, so I hope that it can excel at upgrades :)

    Thank you very much again for your great ultra-rapid support. That is one of the best OneDev feature! :)

  • Robin Shen commented 3 years ago

    Fix this issue of {onedev}/site/assets/root ending up non-existent or automatically create it if missing

    This will be a one-time error and should not happen in future upgrades

    Fix the message saying "The syslog may contain further information." to "Try running server.sh console for further information"

    This message is reported by JSW, a third party service wrapper used by OneDev. And OneDev can not do much here

    Give an error and abort if the user runs upgrade.sh or restore-db.sh with a different user than the RUN_AS_USER configured in server.sh and/or the owner of those scripts/directories, or include this RUN_AS_USER from all shell files from a centralized configuration file.

    Various shell files are from JSW, the third party service wrapper tool, and OneDev can not do much here.

    Either check the config for http_port present and other settings to be valid before attempting the upgrade, or otherwise fix the upgrade not to fail because of this

    This is a bug not validating http_port

    If an upgrade fails, automatically clean & restore the database, or completely avoid scenarios where this can happen.

    In such situation, OneDev can not clean up the database by itself, as data might in an unexpected form.

    All-in-all, this is a bug. OneDev long time user should know that the ugprade procedure is normally painless.

  • Jerome St-Louis commented 3 years ago

    Thanks for the feedback.

    In such situation, OneDev can not clean up the database by itself, as data might in an unexpected form.

    I'm not sure I understand. If the user is supposed to just delete it and re-create it, can't OneDev do that itself since it can create tables etc.,? Is it because there are several different DB drivers back-ends that it would be too complicated to do?

    For PostgreSQL, couldn't it use BEGIN / ROLLBACK if things go wrong?

  • Robin Shen commented 3 years ago

    OneDev does not put whole restore/upgrade procedure in a single transaction, as that can consume a lot of memory. Also it has to deal with different types of databases, and the only cross database approach to clean is to delete tables known by OneDev one by one following foreign key constraint order. If database is in normal state, OneDev knows that its tables are consistent with application models, and can deal with that. But in case of a upgrade failure, database tables may get out of order (for instance new tables gets inserted, new foreign keys applied, etc), and OneDev can not delete all of them reliably with its application model.

  • Jerome St-Louis commented 3 years ago

    Thanks for the detailed explanation.

    If OneDev could realize during the failed upgrade exactly at what point things went wrong, it could revert the one transaction that failed, then the rest of it would all be consistent and could reliably be deleted and/or reverted.

    Sorry to be persistent, but a failed upgrade that leaves a system broken, is just not something that any user should have to deal with, ever, for any reason (especially minor bugs like the ones that came into play here, which are likely to happen again in the future, because we programmers introduce bugs all the time ;)...).

    (And I'm just not talking about OneDev, but any software, Operating Systems in particular! The only thing worst than a system breaking from a failed upgrade is devices bricking for no reason at all other than running low battery... Software quality is going downhill. But OneDev is a gem in terms of performance and memory footprint compared to other solutions, so I still have hope it can upgrade flawlessly for everyone for all future releases! ;))

  • Robin Shen commented 3 years ago

    I filed an improvement request investing this again in future versions:

    Issue #1000 - Restore database with backup data in case of a failed upgrade

  • OneDev changed state to 'Closed' 3 years ago
    Previous Value Current Value
    Open
    Closed
  • OneDev commented 3 years ago

    State changed as code fixing the issue is committed

  • OneDev changed state to 'Released' 3 years ago
    Previous Value Current Value
    Closed
    Released
  • OneDev commented 3 years ago

    State changed as build #3138 is successful

  • Jerome St-Louis commented 3 years ago

    Thank you very much again for the fix and for considering to improve upgrade reliability in future versions!

  • Robin Shen commented 3 years ago

    Glad to be helpful. Thanks for all the feedback helping OneDev getting better.

issue 1/1
Type
Bug
Priority
Critical
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (4)
Reference
OD-998
Please wait...
Connection lost or session expired, reload to recover
Page is in error, reload to recover