Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It would be nice to allow a dynamic SendDelay #1184

Open
kentquirk opened this issue Jun 4, 2024 · 4 comments
Open

It would be nice to allow a dynamic SendDelay #1184

kentquirk opened this issue Jun 4, 2024 · 4 comments
Labels
type: enhancement New feature or request

Comments

@kentquirk
Copy link
Contributor

kentquirk commented Jun 4, 2024

Is your feature request related to a problem? Please describe.

For async systems, sometimes different parts of the system need different amounts of time to expect the rest of the trace to arrive. A global SendDelay setting causes all traces to have to wait for the worst case.

Customer:

Async tasks where the trace context is carried across a message queue often complete long after the global SendDelay. It would be nice to catch these traces and wait longer for them to complete (in many cases, for us, the root span is an async function that returns almost immediately after dispatching work)

Describe the solution you'd like

If the root span has a numeric field called refinery.trace_send_delay, then instead of using the configured SendDelay, refinery will wait for the number of seconds specified in that field before deciding on that trace.

Describe alternatives you've considered

  • Rules for setting SendDelay that are similar to decision rules, but apply immediately when the root span arrives. This is complex to specify and evaluate.
  • Allow a debounce parameter as an alternative to SendDelay (call it DecisionTimeout or something) -- if configured, any new span that arrives for a trace resets that trace's clock, so that as long as late spans aren't TOO late, the trace will delay decisions; only when they slow down enough does the decision get made.

Additional context

Slack thread in pollinators

@kentquirk kentquirk added the type: enhancement New feature or request label Jun 4, 2024
@bixu
Copy link

bixu commented Jun 4, 2024

Ideally, we'd be able to tune delays around the sampling decision per-rule, since the context we want the rule to evaluate in is usually enough for us to know if we want to wait longer than normal (or ignore the root span closing early).

But also, in our environment, asking users to add refinery.trace_send_delay to their trace doesn't feel like excessive lift. The users most affected tend to understand tracing pretty well and the need for usable traces.

@kentquirk
Copy link
Contributor Author

Came across a situation today where allowing SendDelay to be reset whenever a new span arrives would be helpful. (Long trace, variable number of async actions which arrive frequently but over many seconds; it would help to have a debounce model where the trace sends once spans stop arriving.)

@VinozzZ
Copy link
Contributor

VinozzZ commented Nov 8, 2024

A user in the community just requested this feature because they have high value for long-running jobs but would like to lower it for certain traffic

@felipesere
Copy link

Yeah the debounce option would be great for my company too.
We have a loads of very different async workloads so finding a SendDelay that works for most and does not put too much pressure on refinery is really hard.

If I raised a PR for DecisionTimeout, would someone review it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants