#1088  Increase various description/comment length to sufficiently large
Released
jbauer opened 1 year ago

Is there a reason why there is a length limit for the issue description? Can it be removed, increased or made configurable?

When an issue represents a bug and the description should contain the corresponding java stack trace, OneDev often complains that the description is too long because the stack trace is long. This is relatively annoying during migrating selected issues from a different software to OneDev because I then have to manually edit the stack trace and shorten it here and there hoping nothing relevant has been removed. Also when I migrating an internal tool that automatically reports issues including exceptions via API, this would be annoying as well in case the API call would fail.

I can see there might be some DoS potential when fully removing the limit but IMHO it should be set to something unreasonably large to never hit that limit in daily usage.

Robin Shen commented 1 year ago

Issue description is saved as varchar instead of text blob in database in order to be indexed and queried. Many databases have length limit on total varchars in a single table record.

jbauer commented 1 year ago

I have inspected OneDev database (postgres):

  • o_issue.o_description has a length limit of 14000 and does not have any index.
  • o_issuecomment.o_content has a length limit of 15000 and does not have any index.

Because there are no indexes I don't see a reason to use varchar for issue description and comments content. I guess you are using JPA and simple LIKE queries, right? I think a btree index would not help anyways in that case. For real full text search database specific features must usually be used (e.g. GIN/GIST indexes and tsvector/tsquery for postgres). At work I use JPA as well and all string columns are postgres text columns. Search queries work reasonably well if you can also add additional filters like project id or customer id.

Robin Shen commented 1 year ago

Sorry I forgot that. Using text column has its issues. Need to check further. Also OneDev needs to support different databases, and can not leverage specific feature of postgresql.

Unknown commented 1 year ago

@robin You need to bare in mind that you are the only developer on the project currently, and although others have made contributions, maintaining a large number of databases is very time consuming. I believe you support both postgresql and mysql, they both serve similar functions for example, for a large enterprise solution they have a lot of devs to maintain each database connector, there is just you.

Might be worth setting up your priorities for database support, it will make maintaining the project easier :)

Robin Shen commented 1 year ago

Right now, supporting multiple databases does not cost too much, thanks to hibernate. In future, if this does hold OneDev back, I may consider reducing database type support.

Unknown commented 1 year ago
Ah I forgot you use an ORM, that makes things a lot more easy for you. Personally I dislike ORMs cause I find they are inefficient, but it does make things 100000x easier!
Robin Shen commented 1 year ago

Checked it again, and the description column is using type varchar instead of text/clob mostly because of performance on various database supported by OneDev, including MySQL/Oracle/SQLServer.

jbauer commented 1 year ago

Hmm and what is the performance bottleneck? What will be slow? Searching within description via JPA query?

Do you know hibernate-search? It allows you to annotate entity fields and it will manage full text search indexes for you. As a backend you can either use elasticsearch / opensearch or lucene. Since you already use lucene it might be worth a look. Maybe it can help making the description column text/clob or at least max varchar length.

https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#preface

Robin Shen commented 1 year ago

Both for search and load. Hibernate search is not flexible enough, and OneDev actually indexes description itself with Lucene for fuzzy search (search text freely and return matched issues). The search we are discussing here is for grammar aware search ("Description" contains ...).

I don't think it is wise to sacrificing performance for some edge cases of long issue description.

Unknown commented 1 year ago

@jbauer The maximum size of a VARCHAR is 255 characters, this is more than enough for a brief description of the code, and it is a lot more efficient than using a TEXT datatype.

Also why would you need a longer description, project description is meant to be brief, and then the information about the project is added to README.md

Thus yes, the description does need a length limit!

Robin Shen commented 1 year ago

It is 14000 chars in OneDev.

Unknown commented 1 year ago

Wait what? I thought VARCHAR maxed out at 255?

Also why you complaining about a 14,000 character limit, that is like an entire essay worth of characters!

jbauer commented 1 year ago

@polarian

Wait what? I thought VARCHAR maxed out at 255?

The maximum size of a varchar column is highly database specific. For example MySQL as a limit of 65,535 bytes while PostgreSQL allows 10,485,760 bytes.

Also why you complaining about a 14,000 character limit, that is like an entire essay worth of characters!

I am talking about the description and not the title of an issue.

I am using Java and have a tool that automatically reports exceptions (along with some metadata) occurred in production to my old issue tracker. That metadata could for example be a user click stream that causes the exception. Because Java exceptions contain full qualified Java class names as well the file name (including path) a single stack frame (= one line) can easily be 100-150 characters long. User click streams also contain some meta data about where the user has clicked and these lines can also be longer. So at the end 14000 chars / <average line length> does result in roughly 100 lines. From these 100 lines you can subtract 40 lines of user click trace lines. 60 lines of stack frames can easily occur. So 14000 chars might sound a lot but for error reporting it can really be limiting.

The plan is to rewrite that tool to directly push error reports to OneDev and get rid of the old issue tracker but the size limitation is a real blocker to do so. I might be able to split the error report into a title + description and then additional comments if description would be too long but it feels brittle to store the size limits of OneDev in a second tool. I then have to keep track of OneDev changes in case that limit changes. Ideally I want to avoid that.

Currently I manually migrate automatically created issue reports from the old system into OneDev and occasionally I have to shorten the reported data because of the above. If someone verified the migrated issue that person would also like to add a short section "steps to reproduce" to the description which also requires characters. Sure it could be a comment but it is easier to have everything at the top of the issue in one place when someone finally starts working on that issue.

Robin Shen commented 1 year ago

Will do some benchmark to see if using long TEXT type is acceptable.

bufferUnderrun commented 1 year ago

I am using Java and have a tool that automatically reports exceptions (along with some metadata) occurred in production to my old issue tracker.

Logs and Reports, especially long one, should be file attach to the issue and not in the RAW description. Better visibility for both developer and support, and saving performances.

Enlarge column size to TEXT may impact performances, just for this regular case.

jbauer commented 1 year ago

Logs and Reports, especially long one, should be file attach to the issue and not in the RAW description. Better visibility for both developer and support, and saving performances.

I don't agree because text attachments are not searchable.

Generally I am fine with a limit but it must be unreasonably large for daily usage to not block you from using the tool. I see such limits for user input more like a last defense against attackers who want to fill the DB with huge amounts of data and not so much as a performance optimization because sooner or later you have to deal with performance anyway. It does not matter if you have 100,000 rows with each having a fully filled 10,000 character column or 50,000 rows each having a fully filled 20,000 character column. Sure you might get additional time until your critical size is reached (row number * column size) when limiting the column size but you will reach that point and you have to do proper performance optimization (table partitioning, full text search index and specialized SQL, etc..).

Unknown commented 1 year ago

I agree with @jbauer

People do not want to download text files to read the comment, and not only that as @jbauer has already stated, you can not search text files within a page, unless you download each text file and search them independently.

bufferUnderrun commented 1 year ago

Generally I am fine with a limit but it must be unreasonably large for daily usage to not block you from using the tool. I see such limits for user input more like a last defense against attackers who want to fill the DB with huge amounts of data and not so much as a performance optimization because sooner or later you have to deal with performance anyway.

This is more the way you are using the issue system tracking than defense againts attacker.

In my opinion, the description field add some more precise infos about the issue : some lines of text describing the issue, screenshot or a relative amount of copy/paste data.

If you add logs or reports more than 100 lines, better to attach them to a file in the issue. Gain better visibily in the discussion, and hey, you're developper :

  1. open issue depending name, priority, state.
  2. reading description if title not clear.
  3. download the attach file to find the exception in the 10.000 lines to java stracktrace.

Browser is not for opening logs with thousands of lines at all.

Without proper organization on the way the support or automatic tools created issue, it's then the mess for developer to sort, find, understrand and fix.

Robin Shen changed fields 1 year ago
Name Previous Value Current Value
Type
Support Request
Improvement
Robin Shen changed title 1 year ago
Previous Value Current Value
Does description really need a length limit?
Increase various description/comment length to sufficiently large
OneDev changed state to 'Closed' 1 year ago
Previous Value Current Value
Open
Closed
OneDev commented 1 year ago

State changed as code fixing the issue is committed

Robin Shen commented 1 year ago

Now the length is increased to 100000 chars, which should be sufficient. Populated 100k issues and querying against description column can return result within 2 seconds for all supported databases, which should be acceptable.

In most cases, users are expected to perform fuzzy search against title/description/comment, which leverages lucene to do the job, and can return result within 10ms.

OneDev changed state to 'Released' 1 year ago
Previous Value Current Value
Closed
Released
OneDev commented 1 year ago

State changed as build #3440 is successful

issue 1 of 1
Type
Improvement
Priority
Normal
Assignee
Issue Votes (0)
Watchers (4)
Reference
onedev/server#1088
Please wait...
Page is in error, reload to recover