Add retries #110

steve-chavez · 2023-09-25T17:59:01Z

Hey there! I am wanting a new feature that doesn't seem to be present in supabase regarding webhooks. Sometimes my API can get backed up or is down for a few seconds, and webhooks may fail. Some functions rely on webhooks to work and provide accurate data to customers, so like Stripe does, I would like an automated retry system.

If a webhook fails, it will automatically try in a time like 30 minutes. If that fails, it retries in an hour, maybe to 5 retries total. That way, no webhook is missed.

Originally posted by @Wamy-Dev in supabase/supabase#17664

steve-chavez · 2023-09-25T18:02:28Z

This is already possible with some work as demonstrated in https://github.com/supabase/pg_net#retrying-failed-requests but it would be better to have retries built-in

Curl does retries https://everything.curl.dev/usingcurl/downloads/retry but libcurl doesn't have this functionality, we need to implement our own.

TheOtherBrian1 · 2023-09-25T23:47:36Z

In Issue 62, you mentioned the following:

An alternative [to error recovery] could be logging the failed http requests. Then the user would have to search the pg logs.

Originally, the plan was to provide users with the option to log error messages to a designated logger for handling:

... We [can] optionally log something based on the status code when the response is handled?

We should have a config to turn this on/off, maybe named like pg_net.log_http_errors(off by default).
-Issue 49

However, this feature has not been implemented. The config variable for enabling logging is absent in the worker.c codebase. Nonetheless, there may be potential benefits in expanding upon this concept.

Rather than enabling the recording of request failure messages in a log file, the config variable could instruct the extension to automatically reintegrate all dropped requests back into the "net.http_request_queue" table. This approach, unlike the existing solution, would demand the addition of just a single variable and the replacement of one of the error logs with the C equivalent of an INSERT statement.

It is important to emphasize that this suggestion would exclusively monitor requests that failed due to an extension-related error. Requests resulting in 400 or 500 HTTP status errors are already logged in net._http_response and should be managed by users. Such failures may stem from human errors in request endpoint formatting, potentially leading to an endless loop of rejected retry attempts. For users wishing to oversee these specific failed requests, the existing README suggestion provides a safer mechanism, albeit with greater overhead.

steve-chavez · 2023-09-27T19:06:24Z

@TheOtherBrian1 Great points.

Requests resulting in 400 or 500 HTTP status errors are already logged in net._http_response and should be managed by users. Such failures may stem from human errors in request endpoint formatting, potentially leading to an endless loop of rejected retry attempts

True. I think the only safe way to auto-retry would be by following the Retry-After header, however I don't think all services implement it.

We should have a config to turn this on/off, maybe named like pg_net.log_http_errors(off by default).
-#49 (comment)
However, this feature has not been implemented.

Agree, that would be higher priority than this idea.

Rather than enabling the recording of request failure messages in a log file, the config variable could instruct the extension to automatically reintegrate all dropped requests back into the "net.http_request_queue" table
It is important to emphasize that this suggestion would exclusively monitor requests that failed due to an extension-related error.

Hm, do you have examples of extension-related errors? I don't recall one. IIUC it's an internal error rather than a 400 bad request for example.

TheOtherBrian1 · 2023-09-28T06:20:28Z

Hm, do you have examples of extension-related errors?

I haven't experienced any yet, but I've thought about the potential for them to happen.

steve-chavez · 2023-10-06T15:25:02Z

Such failures may stem from human errors in request endpoint formatting, potentially leading to an endless loop of rejected retry attempts.

we run the retries on an exponential scale that tries with increasing periods, that ensure resilience to up to 24 hours, in case of connectivity downtime due to updates, issues etc.

^ From https://github.com/orgs/supabase/discussions/17664#discussioncomment-7208243

We could cap the retrying as suggested above, that way we don't do endless loops.

lookevink · 2024-08-10T18:25:40Z

Currently using Supabase to fire off webhooks to vercel endpoints.

Vercel has been running into issues with Undici when there's CPU contention, so making retries w/ exponential backoff almost mandatory to be used for event-driven applications.

Wasn't quite sure how to implement the official retry solution:
https://github.com/supabase/pg_net#retrying-failed-requests

Instead used QStash. so far working as a charm

shawnmclean · 2024-10-06T20:53:32Z

@lookevink is QStash still the way to go for an event driven system or have you learnt anything new so far?

I need to get db changes on a background worker on trigger.dev. I was hoping to send straight, or at least through my app on vercel and then to trigger.dev.

Going to QStash for durability seems like adding more complexity.

Do you mind explaining the vercel undici CPU issue? I thought each request was its own serverless call?

lookevink · 2024-10-06T22:53:26Z

I'm sticking to QStash for now because I'm hosting on vercel, which has been running into cpu contention on some serverless function Trigger.dev seems like a great way to handle this, but we have bigger fish to fry since the current setup works ᐧ

…

On Sun, Oct 6, 2024 at 4:53 PM Shawn Mclean ***@***.***> wrote: @lookevink <https://github.com/lookevink> is QStash still the way to go for an event driven system or have you learnt anything new so far? I need to get db changes on a background worker on trigger.dev. I was hoping to send straight, or at least through my app on vercel and then to trigger.dev. Going to QStash for durability seems like adding more complexity. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG3D6C4GJIQFRDUX2MSS5RDZ2GPODAVCNFSM6AAAAABMKA7IUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGU4DCNRZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

TheOtherBrian1 · 2024-10-08T01:12:20Z

@shawnmclean, I've been thinking about rewriting the pg_net retry section. I have an early draft. I'm fairly confident in stages 1-6 and 8. It can be used as a basis for any retry system.

I'm personally not a huge fan of using PL/PgSQL for retries, even though that's what I'm currently working on. It defeats the purpose of using unlogged tables for performance gains - a foundational advantage of pg_net. However, this approach does make the solution configurable for Postgres developers and gives them the most control

steve-chavez added the idea label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retries #110

Add retries #110

steve-chavez commented Sep 25, 2023

steve-chavez commented Sep 25, 2023

TheOtherBrian1 commented Sep 25, 2023 •

edited

Loading

steve-chavez commented Sep 27, 2023

TheOtherBrian1 commented Sep 28, 2023

steve-chavez commented Oct 6, 2023

lookevink commented Aug 10, 2024

shawnmclean commented Oct 6, 2024 •

edited

Loading

lookevink commented Oct 6, 2024 via email

TheOtherBrian1 commented Oct 8, 2024 •

edited

Loading

Add retries #110

Add retries #110

Comments

steve-chavez commented Sep 25, 2023

steve-chavez commented Sep 25, 2023

TheOtherBrian1 commented Sep 25, 2023 • edited Loading

steve-chavez commented Sep 27, 2023

TheOtherBrian1 commented Sep 28, 2023

steve-chavez commented Oct 6, 2023

lookevink commented Aug 10, 2024

shawnmclean commented Oct 6, 2024 • edited Loading

lookevink commented Oct 6, 2024 via email

TheOtherBrian1 commented Oct 8, 2024 • edited Loading

TheOtherBrian1 commented Sep 25, 2023 •

edited

Loading

shawnmclean commented Oct 6, 2024 •

edited

Loading

TheOtherBrian1 commented Oct 8, 2024 •

edited

Loading