You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
For async systems, sometimes different parts of the system need different amounts of time to expect the rest of the trace to arrive. A global SendDelay setting causes all traces to have to wait for the worst case.
Customer:
Async tasks where the trace context is carried across a message queue often complete long after the global SendDelay. It would be nice to catch these traces and wait longer for them to complete (in many cases, for us, the root span is an async function that returns almost immediately after dispatching work)
Describe the solution you'd like
If the root span has a numeric field called refinery.trace_send_delay, then instead of using the configured SendDelay, refinery will wait for the number of seconds specified in that field before deciding on that trace.
Describe alternatives you've considered
Rules for setting SendDelay that are similar to decision rules, but apply immediately when the root span arrives. This is complex to specify and evaluate.
Allow a debounce parameter as an alternative to SendDelay (call it DecisionTimeout or something) -- if configured, any new span that arrives for a trace resets that trace's clock, so that as long as late spans aren't TOO late, the trace will delay decisions; only when they slow down enough does the decision get made.
Ideally, we'd be able to tune delays around the sampling decision per-rule, since the context we want the rule to evaluate in is usually enough for us to know if we want to wait longer than normal (or ignore the root span closing early).
But also, in our environment, asking users to add refinery.trace_send_delay to their trace doesn't feel like excessive lift. The users most affected tend to understand tracing pretty well and the need for usable traces.
Came across a situation today where allowing SendDelay to be reset whenever a new span arrives would be helpful. (Long trace, variable number of async actions which arrive frequently but over many seconds; it would help to have a debounce model where the trace sends once spans stop arriving.)
Yeah the debounce option would be great for my company too.
We have a loads of very different async workloads so finding a SendDelay that works for most and does not put too much pressure on refinery is really hard.
If I raised a PR for DecisionTimeout, would someone review it?
Is your feature request related to a problem? Please describe.
For async systems, sometimes different parts of the system need different amounts of time to expect the rest of the trace to arrive. A global SendDelay setting causes all traces to have to wait for the worst case.
Customer:
Describe the solution you'd like
If the root span has a numeric field called
refinery.trace_send_delay
, then instead of using the configured SendDelay, refinery will wait for the number of seconds specified in that field before deciding on that trace.Describe alternatives you've considered
DecisionTimeout
or something) -- if configured, any new span that arrives for a trace resets that trace's clock, so that as long as late spans aren't TOO late, the trace will delay decisions; only when they slow down enough does the decision get made.Additional context
Slack thread in pollinators
The text was updated successfully, but these errors were encountered: