Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

benngarcia
Copy link

This PR was born out of metric gathering for our auto-scaling needs as we're migrating hosting platforms and mid-migration of queue libraries.

This PR adds a new metric to the DelayedJobs plugin - "delayed_jobs_ready". This can be thought of as all of the jobs whose run_at < now(). We needed this metric and not queued or pending since those included all of our jobs which could be days, weeks, or months out.

This PR also adds the ability to view GoodJob metrics sliced by queue, similar to the DelayedJobs plugin. It's fairly self-explanatory why scaling queue workers based off how many jobs are enqueued in a given queue may be beneficial.

@lauer
Copy link

lauer commented Jan 24, 2024

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts?
I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

@benngarcia
Copy link
Author

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts? I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

I'm not sure I fully understand the question/problem statement here - any clarification would be great :D

If you're asking about GoodJob's auto-clean up which deleted jobs after X amount of time (default 2 weeks) you can either disable the clean-up and leave the records in the db or implement some good_job on delete hook to increment some counter somewhere. Though, I'm not sure if that's within the scope of the prometheus_exporter gem, or my PR, so maybe I'm misunderstanding the question 😅

When we do `group(:queue_name).size` it returns the count by queue
=> {"queue_a"=>3, "queue_d"=>1}

The problem is when a queue is empty it will simply be excluded from the
results instead of returning count of 0. So the result we want should be
=> {"queue_a"=>3, "queue_b"=>0, "queue_c"=>0, "queue_d"=>1}

Without returning 0, the queue count in prometheus metrics will be the
last non-zero value meaning we can't auto-scale down the workers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants