#874  CodeIndexer exception - Indexing large repo with Lucene results in exception.
Released
dufernst opened 2 years ago

When indexing a local mirror of https://github.com/TrinityCore/TrinityCore/ Lucene throws the following exception:

2022-08-15 17:57:32,932 ERROR [pool-2-thread-46] i.o.s.u.c.DefaultBatchWorkManager Error doing works
java.lang.IllegalArgumentException: Document contains at least one immense term in field="BLOB_SECONDARY_SYMBOLS" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 111, 100, 117, 108, 101, 32, 61, 10, 123, 10, 32, 32, 32, 32, 123, 10, 32, 32, 32, 32, 32, 32, 32, 32, 48, 120, 50, 49, 44]...', original message: bytes can be at most 32766 in length; got 122665

The "full" stack trace is:

2022-08-15 17:57:32,932 ERROR [pool-2-thread-46] i.o.s.u.c.DefaultBatchWorkManager Error doing works
java.lang.IllegalArgumentException: Document contains at least one immense term in field="BLOB_SECONDARY_SYMBOLS" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 111, 100, 117, 108, 101, 32, 61, 10, 123, 10, 32, 32, 32, 32, 123, 10, 32, 32, 32, 32, 32, 32, 32, 32, 48, 120, 50, 49, 44]...', original message: bytes can be at most 32766 in length; got 122665
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:981)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:524)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:488)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:419)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1757)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400)
	at io.onedev.server.search.code.DefaultCodeIndexManager.indexBlob(DefaultCodeIndexManager.java:297)
	at io.onedev.server.search.code.DefaultCodeIndexManager.index(DefaultCodeIndexManager.java:220)
	at io.onedev.server.search.code.DefaultCodeIndexManager.doIndex(DefaultCodeIndexManager.java:330)
	at io.onedev.server.search.code.DefaultCodeIndexManager.doIndex(DefaultCodeIndexManager.java:353)
	at io.onedev.server.search.code.DefaultCodeIndexManager.access$100(DefaultCodeIndexManager.java:92)
	at io.onedev.server.search.code.DefaultCodeIndexManager$3$1.run(DefaultCodeIndexManager.java:313)
	at io.onedev.server.persistence.DefaultSessionManager$2.call(DefaultSessionManager.java:100)
	at io.onedev.server.persistence.DefaultSessionManager$2.call(DefaultSessionManager.java:96)
	at io.onedev.server.persistence.DefaultSessionManager.call(DefaultSessionManager.java:79)
	at io.onedev.server.persistence.DefaultSessionManager.run(DefaultSessionManager.java:96)
	at io.onedev.server.search.code.DefaultCodeIndexManager$3.doWorks(DefaultCodeIndexManager.java:305)
	at io.onedev.server.util.concurrent.DefaultBatchWorkManager$2.doWorks(DefaultBatchWorkManager.java:110)
	at io.onedev.server.util.concurrent.DefaultBatchWorkManager$1.run(DefaultBatchWorkManager.java:78)
	at io.onedev.server.util.concurrent.DefaultWorkExecutor$5.call(DefaultWorkExecutor.java:252)
	at io.onedev.server.util.concurrent.DefaultWorkExecutor$5.call(DefaultWorkExecutor.java:248)
	at io.onedev.server.security.SecurityUtils$3.call(SecurityUtils.java:398)
	at io.onedev.server.util.concurrent.DefaultWorkExecutor$4.call(DefaultWorkExecutor.java:121)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at io.onedev.server.security.SecurityUtils$1.run(SecurityUtils.java:372)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 122665
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:270)
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:177)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:971)
	... 29 common frames omitted
OneDev changed state to 'Closed' 2 years ago
Previous Value Current Value
Open
Closed
OneDev commented 2 years ago

State changed as code fixing the issue is committed

OneDev changed state to 'Released' 2 years ago
Previous Value Current Value
Closed
Released
OneDev commented 2 years ago

State changed as build #2904 is successful

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Affected Versions
Not Found
Issue Votes (0)
Watchers (4)
Reference
onedev/server#874
Please wait...
Page is in error, reload to recover