#1848  Issue with build
Closed
andrzej opened 2 weeks ago

I'm using Kubernetes Executor for running builds. Unfortunately, after restart of OneDev (that was caused most likely during OOM on the cluster node), build stopped working. When I try to execute a build I'm getting following error (I'm attaching with a larger part of a log):


12:55:32 Switched to branch 'master'
12:55:32 Branch 'master' set up to track remote branch 'master' from 'origin'.
12:55:32 Step "maven build & deploy -> checkout" is successful
12:55:33 Running step "maven build & deploy -> generate pom checksum"...
12:55:33 Generated checksum: 38ff8bf6b5e596e59ad213e3aa209f47
12:55:33 Step "maven build & deploy -> generate pom checksum" is successful
12:55:33 Running step "maven build & deploy -> set up maven cache"...
12:55:34 Step "maven build & deploy -> set up maven cache" is successful
12:55:35 Running step "maven build & deploy -> detect build version"...
12:55:35 Detecting project version (may require some time while downloading maven dependencies)...
12:55:46 Step "maven build & deploy -> detect build version" is successful
12:55:47 Running step "maven build & deploy -> set build version"...
12:57:58 javax.ws.rs.ProcessingException: java.net.ConnectException: Connection timed out (Connection timed out)
12:57:58     	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:287)
12:57:58     	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753)
12:57:58     	at org.glassfish.jersey.internal.Errors.process(Errors.java:316)
12:57:58     	at org.glassfish.jersey.internal.Errors.process(Errors.java:298)
12:57:58     	at org.glassfish.jersey.internal.Errors.process(Errors.java:229)
12:57:58     	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414)
12:57:58     	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:752)
12:57:58     	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:445)
12:57:58     	at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:351)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.runServerStep(KubernetesHelper.java:951)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.runServerStep(KubernetesHelper.java:907)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.runServerStep(KubernetesHelper.java:891)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.runServerStep(KubernetesHelper.java:864)
12:57:58     	at io.onedev.k8shelper.RunServerSideStep.main(RunServerSideStep.java:27)
12:57:58     Caused by: java.lang.RuntimeException: java.net.ConnectException: Connection timed out (Connection timed out)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.writeInt(KubernetesHelper.java:772)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper$15.write(KubernetesHelper.java:936)
12:57:58     	at org.glassfish.jersey.message.internal.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:79)
12:57:58     	at org.glassfish.jersey.message.internal.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:61)
12:57:58     	at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:266)
12:57:58     	at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:251)
12:57:58     	at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:163)
12:57:58     	at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1135)
12:57:58     	at org.glassfish.jersey.client.ClientRequest.doWriteEntity(ClientRequest.java:516)
12:57:58     	at org.glassfish.jersey.client.ClientRequest.writeEntity(ClientRequest.java:498)
12:57:58     	at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:384)
12:57:58     	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:282)
12:57:58     	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278)
12:57:58     	... 13 more
12:57:58     Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
12:57:58     	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
12:57:58     	at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
12:57:58     	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
12:57:58     	at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
12:57:58     	at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
12:57:58     	at java.base/java.net.Socket.connect(Unknown Source)
12:57:58     	at java.base/sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
12:57:58     	at java.base/sun.net.NetworkClient.doConnect(Unknown Source)
12:57:58     	at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
12:57:58     	at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source)
12:57:58     	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)
12:57:58     	at org.glassfish.jersey.client.internal.HttpUrlConnector.lambda$_apply$0(HttpUrlConnector.java:382)
12:57:58     	at org.glassfish.jersey.message.internal.CommittingOutputStream.commitStream(CommittingOutputStream.java:195)
12:57:58     	at org.glassfish.jersey.message.internal.CommittingOutputStream.commitStream(CommittingOutputStream.java:189)
12:57:58     	at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:208)
12:57:58     	at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:295)
12:57:58     	at io.onedev.k8shelper.KubernetesHelper.writeInt(KubernetesHelper.java:770)
12:57:58     	... 25 more
12:57:58     
12:57:59 Step "maven build & deploy -> set build version is failed: Command failed with exit code 1

As you can see, previous HTTP request (in previous steps) were working, so I wonder what could be an issue here? Is it OneDev failing to accept connection or kubernetes cluster? or maybe something else?

andrzej referenced from other issue 2 weeks ago
andrzej commented 2 weeks ago

I've found a possible the source of the issue. It could be caused by change of the nodes that was not reflected in DNS. Only 2 of 3 IPs assigned for our domain that hosts OneDev were actually accepting requests, so 1 on 3 requests could fail (depending on selected by HTTP client IP to connect).

I wonder, if actually using for communication internal service domain names (in kubernetes) wouldn't be better? But I think right now it is not possible to use it as Server URL may be used for other things than just internal communication between internal components of OneDev. Maybe it would be worth to retry a request that fails with timeout or 502 error?

Robin Shen commented 2 weeks ago

Normally dns entry should map domain name to load balancer ip, and the load balancer then forward request to working nodes automatically.

Also you may retry job when certain condition is satisfied, such as log contains some text pattern. Check more settings of the job for details.

Robin Shen changed state to 'Closed' 1 week ago
Previous Value Current Value
Open
Closed
issue 1 of 1
Type
Question
Priority
Normal
Assignee
Labels
No labels
Issue Votes (0)
Watchers (3)
Reference
onedev/server#1848
Please wait...
Page is in error, reload to recover