Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added basic GoodJob collectors #280

Merged
merged 2 commits into from
May 29, 2023
Merged

Conversation

AndersGM
Copy link
Contributor

@AndersGM AndersGM commented May 25, 2023

I need metrics for GoodJob, so I've given it a shot.

@AndersGM AndersGM force-pushed the good-job-exporter branch 2 times, most recently from 82448c4 to d002076 Compare May 25, 2023 13:01
@AndersGM AndersGM force-pushed the good-job-exporter branch from d002076 to bb09144 Compare May 25, 2023 20:19
@AndersGM AndersGM marked this pull request as ready for review May 25, 2023 20:32
@AndersGM AndersGM changed the title Added basic Good Job collectors Added basic GoodJob collectors May 25, 2023
@SamSaffron
Copy link
Member

looking good but I think there is a bit of fine tuning we need here:

see: https://prometheus.io/docs/practices/naming/

there is a total postfix missing and some of these should be Counter vs Gauge no?

@AndersGM
Copy link
Contributor Author

@SamSaffron Thank you very much for taking your time to look at my PR!

I've never implemented my own metrics, so I might very well be wrong. I chose gauges for all of the metrics since data is read from the database, and jobs can be destroyed and retried at will. Consequently, the numbers can actually decrement for all the metrics, which isn't allowed for counters (AFAIK).

To get actual counters, I think we'll need to either extend GoodJob or count through callbacks.

Let me know what you think!

@SamSaffron
Copy link
Member

be sure to read up about it, counters are "smart" they can go back to 0, it is fine, prom adjusts to it.

Gauges are for things where "rate" does not matter like memory.
Counters are for things where "rate" matters, like counting stuff.

@AndersGM
Copy link
Contributor Author

AndersGM commented May 29, 2023

be sure to read up about it, counters are "smart" they can go back to 0, it is fine, prom adjusts to it.

Yes - my point is that since, with my suggested implementation, we read data directly from the database, the numbers would never really be reset, but rather (potentially) decrease by some number. That could be when the user manually delete some of the jobs through GoodJob's user interface (so the number of succeeded jobs could go from 200 to 199). It could also be when jobs older than 14days are deleted through automatic clean ups. Counters can only ever increment or be reset to zero - never decrease :-)

I completely agree that counters would be favorable for the use case, however, we'll need to build on top of GoodJob (eg. count each time a job has completed). I can see there is already some work going on in GoodJob about adding stats, that we can then read from, but I think we'll wait quite some time before its ready, so this is best effort at the time of writing AFAIK :-)

@SamSaffron
Copy link
Member

I see! will merge this then, naming is not ideal prom likes to recommend _total. But it seems reasonably cohesive with our existing naming

@SamSaffron SamSaffron merged commit 8d77c53 into discourse:main May 29, 2023
@AndersGM AndersGM deleted the good-job-exporter branch May 30, 2023 09:14
@AndersGM
Copy link
Contributor Author

AndersGM commented May 30, 2023

@SamSaffron Agree! At least _total is prohibited for gauge:

raise ArgumentError, "The metric name of gauge must not have _total suffix. Given: #{name}"

I will try and work on a better implementation in the GoodJob project.

@lauer
Copy link

lauer commented Jan 24, 2024

@AndersGM did you find a solution to handle that those counters can decrease doing clean up script?
As I see it, the problem would persist in this new pr #302

@AndersGM
Copy link
Contributor Author

@lauer I implemented metrics for GoodJob in this PR to address the issue: bensheldon/good_job#984 - haven't looked into it since. The file is now improved and moved to https://github.com/bensheldon/good_job/blob/main/lib/good_job/job_performer/metrics.rb to support MultiScheduler.

As far as I can see I forgot about it, and never actually implemented it in prometheus_exporter. Consequently, it is still just using the database counts. I'll try and look into using the metrics from GoodJob later today - it should be straight forward to use the metrics there :-)

@lauer
Copy link

lauer commented Jan 26, 2024

okay, an other issue we have, is that the SQL queries around database count is showed when we are using rails console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants