-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Logs API stops functioning when a pod fails #21287
base: master
Are you sure you want to change the base?
fix: Logs API stops functioning when a pod fails #21287
Conversation
🔴 Preview Environment stopped on BunnyshellSee: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21287 +/- ##
==========================================
+ Coverage 55.19% 55.21% +0.01%
==========================================
Files 337 337
Lines 57058 57061 +3
==========================================
+ Hits 31496 31506 +10
+ Misses 22863 22860 -3
+ Partials 2699 2695 -4 ☔ View full report in Codecov by Sentry. |
e0d48bb
to
d0e06c7
Compare
0dbcf76
to
9035ea5
Compare
@@ -39,6 +39,11 @@ func parseLogsStream(podName string, stream io.ReadCloser, ch chan logEntry) { | |||
timeStampStr := parts[0] | |||
logTime, err := time.Parse(time.RFC3339Nano, timeStampStr) | |||
if err != nil { | |||
if strings.HasPrefix(line, "unable to retrieve container logs for ") || | |||
strings.HasPrefix(line, "Unable to retrieve container logs for ") { | |||
ch <- logEntry{line: line, podName: podName, timeStamp: time.Now()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should err still be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, any error in the log channel would stop all other streams immediately.
argo-cd/server/application/application.go
Lines 1796 to 1804 in 728b31e
logStream := mergeLogStreams(streams, time.Millisecond*100) | |
sentCount := int64(0) | |
done := make(chan error) | |
go func() { | |
for entry := range logStream { | |
if entry.err != nil { | |
done <- entry.err | |
return | |
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the purpose of ch <- logEntry{err: err}
? Shouldn't we log all error/message not starting with a timestamp in a way that they do not crash and exit properly?
There might be other error similar to unable to retrieve container logs for
that we might receive.
for entry := range res { | ||
entries = append(entries, entry) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need some wait before checking results to avoid flakiness. Or, use some channel for synchronization. This would also help to exit from a goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added done channel for sync,
Have increased the time window to 5 seconds, would 5 seconds be enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the value of the timestamp really matter? Or what matters is that we received the log correctly?
"timestamp <= now" seems more reliable. I don't think it is works mocking now, but ideally this is what should be done.
c1f1255
to
086514c
Compare
Signed-off-by: Fluder-Paradyne <[email protected]>
Signed-off-by: Fluder-Paradyne <[email protected]>
…essages - Add sync and increase timeout for test Signed-off-by: Fluder-Paradyne <[email protected]>
086514c
to
a136cf4
Compare
Closes: #21286
When a pod is in a Failed, Completed, or Error state, the Kubernetes API sometimes returns a log message like:
unable to retrieve container logs for containerd://a0002996fc1eb2a973c4708fa4f4f5f1b9e643114322c71fe54dbbfef27148d5
.The current log parsing logic assumes logs always begin with a timestamp, which causes issues when encountering this specific log format.
unable to retrieve container logs for
.Checklist: