Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulated StreamResetException as IllegalStateException unhandled by the SDK #6946

Closed
celikrecep opened this issue Dec 12, 2024 · 11 comments · Fixed by #6969
Closed

Encapsulated StreamResetException as IllegalStateException unhandled by the SDK #6946

celikrecep opened this issue Dec 12, 2024 · 11 comments · Fixed by #6969
Labels
Bug Something isn't working

Comments

@celikrecep
Copy link

Hello, I am using the OpenTelemetry-Java SDK in an Android application. I am encountering a StreamResetException error. Normally, when an exception is thrown from requests made within the app, we catch these exceptions, but these network errors do not seem to fall into the same handling mechanism. I did not encounter this issue while using version 1.40.0, but I started experiencing it after upgrading to version 1.44.0. I found a similar issue for a different error, and I wanted to share it with you as I think it might be related. Since the stack trace did not point to any specific part of the project, I felt it necessary to open an issue. Could you please assist me with this? I am open to any suggestions.

Fatal Exception: java.lang.IllegalStateException
okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:609)
java.util.HashMap.forEach (HashMap.java:1432)
io.opentelemetry.sdk.internal.AttributesMap.forEach (AttributesMap.java)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:536)
io.opentelemetry.exporter.internal.otlp.traces.SpanEventStatelessMarshaler.writeTo (SpanEventStatelessMarshaler.java:29)
io.opentelemetry.exporter.internal.otlp.traces.SpanEventStatelessMarshaler.writeTo (SpanEventStatelessMarshaler.java:19)
io.opentelemetry.exporter.internal.marshal.ProtoSerializer.serializeRepeatedMessageWithContext (ProtoSerializer.java:239)
io.opentelemetry.exporter.internal.otlp.traces.SpanStatelessMarshaler.writeTo (SpanStatelessMarshaler.java:53)
io.opentelemetry.exporter.internal.otlp.traces.SpanStatelessMarshaler.writeTo (SpanStatelessMarshaler.java:22)
io.opentelemetry.exporter.internal.marshal.ProtoSerializer.serializeRepeatedMessageWithContext (ProtoSerializer.java:239)
io.opentelemetry.exporter.internal.otlp.traces.InstrumentationScopeSpansStatelessMarshaler.writeTo (InstrumentationScopeSpansStatelessMarshaler.java:39)
io.opentelemetry.exporter.internal.otlp.traces.InstrumentationScopeSpansStatelessMarshaler.writeTo (InstrumentationScopeSpansStatelessMarshaler.java:21)
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:606)
java.util.IdentityHashMap.forEach (IdentityHashMap.java:1354)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:513)
io.opentelemetry.exporter.internal.otlp.traces.ResourceSpansStatelessMarshaler.writeTo (ResourceSpansStatelessMarshaler.java:46)
io.opentelemetry.exporter.internal.otlp.traces.ResourceSpansStatelessMarshaler.writeTo (ResourceSpansStatelessMarshaler.java:28)
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:606)
java.util.IdentityHashMap.forEach (IdentityHashMap.java:1354)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:513)
io.opentelemetry.exporter.internal.otlp.traces.LowAllocationTraceRequestMarshaler.writeTo (LowAllocationTraceRequestMarshaler.java:73)
io.opentelemetry.exporter.internal.marshal.Marshaler.writeBinaryTo (Marshaler.java:23)
io.opentelemetry.exporter.sender.okhttp.internal.OkHttpHttpSender$RawRequestBody.writeTo (OkHttpHttpSender.java:213)
okhttp3.internal.http.CallServerInterceptor.intercept (CallServerInterceptor.kt:62)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
fsimpl.bD.intercept (Unknown Source:14)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.connection.ConnectInterceptor.intercept (ConnectInterceptor.kt:34)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.cache.CacheInterceptor.intercept (CacheInterceptor.kt:95)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.http.BridgeInterceptor.intercept (BridgeInterceptor.kt:83)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept (RetryAndFollowUpInterceptor.kt:76)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept (RetryInterceptor.java:91)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
fsimpl.bD.intercept (Unknown Source:14)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp (RealCall.kt:201)
okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:517)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)
Caused by okhttp3.internal.http2.StreamResetException
stream was reset: CANCEL
okhttp3.internal.http2.Http2Stream.checkOutNotClosed$okhttp (Http2Stream.kt:646)
okhttp3.internal.http2.Http2Stream$FramingSink.emitFrame (Http2Stream.kt:557)
okhttp3.internal.http2.Http2Stream$FramingSink.write (Http2Stream.kt:532)
okio.ForwardingSink.write (ForwardingSink.kt:29)
okhttp3.internal.connection.Exchange$RequestBodySink.write (Exchange.kt:223)
okio.RealBufferedSink.emitCompleteSegments (RealBufferedSink.kt:256)
okio.RealBufferedSink$outputStream$1.write (RealBufferedSink.kt:118)
io.opentelemetry.exporter.internal.marshal.CodedOutputStream$OutputStreamEncoder.doFlush (CodedOutputStream.java:632)
io.opentelemetry.exporter.internal.marshal.CodedOutputStream$OutputStreamEncoder.write (CodedOutputStream.java:544)
io.opentelemetry.exporter.internal.marshal.StatelessMarshalerUtil.encodeUtf8 (StatelessMarshalerUtil.java:455)
io.opentelemetry.exporter.internal.marshal.StatelessMarshalerUtil.writeUtf8 (StatelessMarshalerUtil.java:442)
io.opentelemetry.exporter.internal.marshal.StatelessMarshalerUtil.writeUtf8 (StatelessMarshalerUtil.java:426)
io.opentelemetry.exporter.internal.marshal.ProtoSerializer.writeString (ProtoSerializer.java:163)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeStringWithContext (Serializer.java:235)
io.opentelemetry.exporter.internal.otlp.StringAnyValueStatelessMarshaler.writeTo (StringAnyValueStatelessMarshaler.java:29)
io.opentelemetry.exporter.internal.otlp.AttributeKeyValueStatelessMarshaler$ValueStatelessMarshaler.writeTo (AttributeKeyValueStatelessMarshaler.java:117)
io.opentelemetry.exporter.internal.otlp.AttributeKeyValueStatelessMarshaler$ValueStatelessMarshaler.writeTo (AttributeKeyValueStatelessMarshaler.java:71)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeMessageWithContext (Serializer.java:311)
io.opentelemetry.exporter.internal.otlp.AttributeKeyValueStatelessMarshaler.writeTo (AttributeKeyValueStatelessMarshaler.java:47)
io.opentelemetry.exporter.internal.otlp.AttributeKeyValueStatelessMarshaler.writeTo (AttributeKeyValueStatelessMarshaler.java:27)
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:606)
java.util.HashMap.forEach (HashMap.java:1432)
io.opentelemetry.sdk.internal.AttributesMap.forEach (AttributesMap.java)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:536)
io.opentelemetry.exporter.internal.otlp.traces.SpanEventStatelessMarshaler.writeTo (SpanEventStatelessMarshaler.java:29)
io.opentelemetry.exporter.internal.otlp.traces.SpanEventStatelessMarshaler.writeTo (SpanEventStatelessMarshaler.java:19)
io.opentelemetry.exporter.internal.marshal.ProtoSerializer.serializeRepeatedMessageWithContext (ProtoSerializer.java:239)
io.opentelemetry.exporter.internal.otlp.traces.SpanStatelessMarshaler.writeTo (SpanStatelessMarshaler.java:53)
io.opentelemetry.exporter.internal.otlp.traces.SpanStatelessMarshaler.writeTo (SpanStatelessMarshaler.java:22)
io.opentelemetry.exporter.internal.marshal.ProtoSerializer.serializeRepeatedMessageWithContext (ProtoSerializer.java:239)
io.opentelemetry.exporter.internal.otlp.traces.InstrumentationScopeSpansStatelessMarshaler.writeTo (InstrumentationScopeSpansStatelessMarshaler.java:39)
io.opentelemetry.exporter.internal.otlp.traces.InstrumentationScopeSpansStatelessMarshaler.writeTo (InstrumentationScopeSpansStatelessMarshaler.java:21)
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:606)
java.util.IdentityHashMap.forEach (IdentityHashMap.java:1354)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:513)
io.opentelemetry.exporter.internal.otlp.traces.ResourceSpansStatelessMarshaler.writeTo (ResourceSpansStatelessMarshaler.java:46)
io.opentelemetry.exporter.internal.otlp.traces.ResourceSpansStatelessMarshaler.writeTo (ResourceSpansStatelessMarshaler.java:28)
io.opentelemetry.exporter.internal.marshal.Serializer$RepeatedElementPairWriter.accept (Serializer.java:606)
java.util.IdentityHashMap.forEach (IdentityHashMap.java:1354)
io.opentelemetry.exporter.internal.marshal.Serializer.serializeRepeatedMessageWithContext (Serializer.java:513)
io.opentelemetry.exporter.internal.otlp.traces.LowAllocationTraceRequestMarshaler.writeTo (LowAllocationTraceRequestMarshaler.java:73)
io.opentelemetry.exporter.internal.marshal.Marshaler.writeBinaryTo (Marshaler.java:23)
io.opentelemetry.exporter.sender.okhttp.internal.OkHttpHttpSender$RawRequestBody.writeTo (OkHttpHttpSender.java:213)
okhttp3.internal.http.CallServerInterceptor.intercept (CallServerInterceptor.kt:62)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
fsimpl.bD.intercept (Unknown Source:14)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.connection.ConnectInterceptor.intercept (ConnectInterceptor.kt:34)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.cache.CacheInterceptor.intercept (CacheInterceptor.kt:95)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.http.BridgeInterceptor.intercept (BridgeInterceptor.kt:83)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept (RetryAndFollowUpInterceptor.kt:76)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept (RetryInterceptor.java:91)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
fsimpl.bD.intercept (Unknown Source:14)
okhttp3.internal.http.RealInterceptorChain.proceed (RealInterceptorChain.kt:109)
okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp (RealCall.kt:201)
okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:517)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)
```

**What did you expect to see?**
I was expecting the SDK not to throw an exception

**What did you see instead?**
it throws an exception.

**What version and what artifacts are you using?**
Artifacts: (`opentelemetry-bom`, `opentelemetry-sdk`, `opentelemetry-api`, `opentelemetry-exporter-otlp`, `opentelemetry-exporter-logging`, `opentelemetry-exporter-otlp-common`)

Version: (`1.44.0`)

How did you reference these artifacts? 
`build.gradle`

**Environment**
OS: (Android)

@celikrecep celikrecep added the Bug Something isn't working label Dec 12, 2024
@jack-berg
Copy link
Member

fsimpl.bD.intercept (Unknown Source:14)

Hi @celikrecep any idea where this stack trace is coming from? Its not part of opentelemetry-java or from the okhttp library we use as an HTTP client.

Also, can you share some of the logs surrounding that stack trace? My current read of the source code leads me to believe that this exception occurred but could be a normal / expected part of transmitting data with unreliable networks. The additional logs could indicate what happened as a result of this exception. My best guess based on the source code is that an attempt to export spans failed, and this exception stack trace is just explaining why.

@celikrecep
Copy link
Author

@jack-berg

fsimpl.bD.intercept (Unknown Source:14)

appears to originate from the FullStory SDK. You can find their documentation here: FullStory Android SDK. It’s worth noting that FullStory has been integrated into our app for a long time; it was not recently added. Unfortunately, we don’t have any additional logs besides the ones I’ve already shared from Firebase. Based on the events we’ve sent, I can infer that users are encountering this crash randomly while navigating through the app. There doesn’t seem to be any specific pattern to it. Could it be related to the version? As I mentioned, we didn’t encounter this issue before.

@jack-berg
Copy link
Member

appears to originate from the FullStory SDK. You can find their documentation here: FullStory Android SDK. It’s worth noting that FullStory has been integrated into our app for a long time; it was not recently added

👍 Always weird to see other unrelated code in the stack trace but its totally possible that its benign.

Unfortunately, we don’t have any additional logs besides the ones I’ve already shared from Firebase. Based on the events we’ve sent, I can infer that users are encountering this crash randomly while navigating through the app.

Are you sure the app is crashing? Here's what I see in the stack trace:

  • The OkHttp sender for OTLP exporters enqueues export requests to be performed on a separate here. That's why we see the root of the stacktrace as java.lang.Thread.run and don't see what is actually calling the exporter's export method.
  • OTLP exporters have retry enabled by default. With OkHttp, this is performed with [RetryInterceptor](https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/sender/okhttp/src/main/java/io/opentelemetry/exporter/sender/okhttp/internal/RetryInterceptor.java#L91, which we see called in the stacktrace: io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept (RetryInterceptor.java:91).
  • Retry interceptor evaluates whether a failed export attempt is retryable based on the response status code (if available) and any IOException that occurs is considered retryable.
  • Retry eventually gives up and returns the failure response or throws the IOException for the calling code to handle, which causes this callback's onFailure or OnResponse to be invoked, which causes HttpExporter's onResponse or OnError to be invoked.
  • HttpExporter's onError method logs the a descriptive error message and the exception that caused the failure.
  • Because your stacktrace shows RetryInterceptor, we know that at least one export attempt failed prior to the exception: io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept (RetryInterceptor.java:91)
  • RetryInterceptor determined that the failure was retryable, and attempts the export again which requires serializing the data into the request body: io.opentelemetry.exporter.sender.okhttp.internal.OkHttpHttpSender$RawRequestBody.writeTo (OkHttpHttpSender.java:213)
  • The serialization process has a bunch of code which walks through all the records to be exported and converts them to the protobuf binary representation is a very careful manner. Somewhere along this process, the HTTP request is interrupted and the stream is cancelled Caused by okhttp3.internal.http2.StreamResetException stream was reset: CANCEL.
  • It looks like this IOException was propagated all the way up the call stack I discussed previously, ultimately terminating in HttpExporter#onError where its logged.

This is why I asked about any additional logs before or after the stacktrace. Based on what I see, it just looks like a typical export request which failed after retry, with the stacktrace logged to show exactly what happened. Failed export attempts are common in unreliable networks like the internet. We retry to reduce their impact, but it doesn't always result in success. I expect the reliability of networks on android mobile devices is especially problematic. Side note, there's a contrib component called disk-buffering that aims to improve reliable transmission in mobile environments by buffering data to disk.

@celikrecep
Copy link
Author

celikrecep commented Dec 16, 2024

@jack-berg Yes, since it’s a fatal exception, I’m sure it’s causing the crash. The error is also listed under the crash section in Firebase. I suspect the issue is due to throwing an IllegalStateException, because the accept method in the Serializer. converts an IOException into an IllegalStateException. This method is called from the serializeRepeatedMessageWithContext via messages.forEach(writer);.

In the code here, This method expects an IOException, but it throws an IllegalStateException. Because of this, the places where this method is used might be performing incorrect checks.

When debugging, I noticed that when a StreamResetException is thrown at this point, it’s converted into an IllegalStateException, which leads to the app crash. However, when I forced it to throw the StreamResetException again at the same location, and ensured that the IllegalStateException was converted back to StreamResetException here, everything worked as expected.

@jack-berg
Copy link
Member

I noticed that when a StreamResetException is thrown at this point, it’s converted into an IllegalStateException, which leads to the app crash.

Great find! That's it. And all the calling code expects IOException so IllegalStateException goes uncaught.

@celikrecep
Copy link
Author

This part is within OpenTelemetry. What steps should I take to prevent crashes? Should I wait for a fix to be released, or do you have any other recommendations?

@jack-berg
Copy link
Member

I suspect the problem started impacting you in 1.44.0 because we switched to enabling resuable memory mode by default, which switches the serialization code path to the one that rethrows IOException as IllegalStateException.

To get around this in the short terms, you can revert to the previous behavior by toggling otel.java.exporter.memory_mode=immutable_data as documented here.

We will definitely get this fixed for the next release (January 10th), and maybe a patch release before then.

@jkwatson
Copy link
Contributor

@celikrecep how urgent is it that we release a patch for this? Would it be ok to wait for the next release in early January?

@celikrecep
Copy link
Author

@jkwatson When i tried the method @jack-berg suggested, it seems like it will resolve the issue as it doesn’t hit the relevant line. We’ll send the package today or tomorrow. It’s probably not going to be an issue, but can i reach out depending on the situation?

@jkwatson
Copy link
Contributor

@celikrecep great, thanks. Since we're at the end of the year, and many are on holiday/vacation, we'd prefer to just wait for the next release cycle, if we can. Please let us know if it becomes urgent.

@celikrecep
Copy link
Author

@jkwatson sorry for the delay; the package had to be postponed. It seems we will have to wait for the January release. Happy holidays in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants