-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix termination deadlock when the output is retrying whole failed bulk requests #1117
Fix termination deadlock when the output is retrying whole failed bulk requests #1117
Conversation
…n a retry loop during pipeline terminations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return without raising (and crashing the plugin/worker), the subsequent normal-completion of a batch that was sourced from the PQ will cause it to be acknowledged and marked eligible for immediate deletion, causing a restarted pipeline to NOT include the batch. I consider to be a data-loss regression, even though it is less-bad for the memory-queue case, or for cases where some amount of data-loss is preferred if it is the cost of keeping things moving.
The current behaviour is an unfortunate deadlock, true, but a force-shutdown followed by a restart currently will re-emit the unacknowledged batch for reprocessing.
Should we instead raise an exception that crashes the worker and prevents the batch from being acknowledged?
Thanks a ton for chiming in @yaauie !
Do you mean that we could replace the changed |
@@ -154,12 +154,17 @@ def successful_connection? | |||
!!maximum_seen_major_version && alive_urls_count > 0 | |||
end | |||
|
|||
def shutting_down? | |||
@stopping.true? || (!execution_context.nil? && !execution_context.pipeline.nil? && execution_context.pipeline.shutdown_requested? && !execution_context.pipeline.worker_threads_draining?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the cascading nil
checks we could use the &.
safe navigation operator. That would simplify as:
@stopping.true? || (!execution_context.nil? && !execution_context.pipeline.nil? && execution_context.pipeline.shutdown_requested? && !execution_context.pipeline.worker_threads_draining?) | |
@stopping.true? || (execution_context&.pipeline&.shutdown_requested? && !execution_context&.pipeline&.worker_threads_draining?) |
Superseded by #1119 |
Closes: #1116
Detailed explanation of the problem: #1116 (comment)
Break the termination deadlock when whole bulk requests are failing in a retry loop during pipeline terminations