-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add metric pgrst_jwt_cache_size in admin server #3802
base: main
Are you sure you want to change the base?
Conversation
Don't forget the feedback about the actual size in bytes #3801 (comment) Ultimately, I believe we're going to need an LRU cache for the JWT cache to be production ready. So having this metric in bytes will be useful now and later. |
1c8fef6
to
9a23b43
Compare
Calculating "actual" byte size of cache is sort of tricky (haskell laziness is a lot to deal with sometimes) and I still haven't figured it out YET. In the meantime, I have written some code to approximate the cache size in bytes. It works as follows:
Now, we can use |
1254ed5
to
11727ab
Compare
This is pretty cool, so I do see the size starting at 0 then increasing as I do requests: # $ nix-shell
# $ PGRST_ADMIN_SERVER_PORT=3001 PGRST_JWT_CACHE_MAX_LIFETIME=30000 postgrest-with-postgresql-16 -f test/spec/fixtures/load.sql postgrest-run
$ curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size The number of cached JWTs
# TYPE pgrst_jwt_cache_size gauge
pgrst_jwt_cache_size 0.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 72.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 144.0
$ curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 10 postgrest_test_author)"
[]
$ curl localhost:3001/metrics
..
pgrst_jwt_cache_size 216.0 Of course this doesn't drop down after a while because we need #3801 for that. One issue that I've noticed is that we're printing empty log lines for each request:
This is due to the addition of the new Observation postgrest/src/PostgREST/Observation.hs Line 60 in 9a23b43
And the following line here: postgrest/src/PostgREST/Observation.hs Line 146 in 9a23b43
This is surprising behavior, I'll try to refactor this on a new PR. For now, how about printing a message like:
This should only happen for a log-level greater than debug, check how this is done on: postgrest/src/PostgREST/Logger.hs Lines 79 to 96 in 9a23b43
|
The above is understandable. What we really need is a good enough approximation so we have an order-of-magnitude understanding to see if the cache size is in KB, MB or GB. So far this seems enough. We should definitely document why we do an approximation though. |
@@ -15,7 +15,7 @@ module PostgREST.App | |||
, run | |||
) where | |||
|
|||
|
|||
import Control.Monad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused?
I've just noticed that perf badly drops down on this PR:
https://github.com/PostgREST/postgrest/pull/3802/checks?check_run_id=34517395528 The recursiveSize function says:
@taimoorzaeem What could we do to avoid this drop? Maybe calculate the cache size periodically on a background thread? Any thoughts? |
src/PostgREST/Utils.hs
Outdated
-- Code in this module is taken from: | ||
-- https://hackage.haskell.org/package/ghc-datasize-0.2.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem Could you remind me why we need to vendor this code here? Maybe it's possible to use the dependency directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the dependency directly would have been perfect. The only issue is that ghc-datasize-0.2.7
is marked broken in nixpkgs. Any workaround ideas?
error: Package ‘ghc-datasize-0.2.7’ in /nix/store/hyy4vjyamr7pj1br9y8r1ssssqp570y2-source/pkgs/development/haskell-modules/hackage-packages.nix:120611 is marked as broken, refusing to evaluate.
a) To temporarily allow broken packages, you can use an environment variable
for a single invocation of the nix tools.
$ export NIXPKGS_ALLOW_BROKEN=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It builds fine for me on latest nixpkgs master. So we probably just need to make it as unbroken.
For now, it should be enough to add this to overlays/haskell-packages.nix
:
ghc-datasize = lib.markUnbroken prev.ghc-datasize;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works! Thanks.
src/PostgREST/Auth.hs
Outdated
jwtCacheSize <- calcApproxCacheSizeInBytes jwtCache | ||
observer $ JWTCache jwtCacheSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function works very quickly on small data structures, but can be slow on large and complex ones. If speed is an issue it's probably possible to get the exact size of a small portion of the data structure and then estimate the total size from that.
@taimoorzaeem What could we do to avoid this drop? Maybe calculate the cache size periodically on a background thread? Any thoughts?
@steve-chavez In this PR, we are calculating size of complete cache on EVERY request which is frankly horrible (can't believe I wrote this). How about whenever we add a cache entry, we calculate the size of only a single cache entry which would be small and quick and increment the cache size accordingly.
Then later in #3801, we can think about a background thread to purge expired entries or however we would like to handle purging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem Sounds good. Let's try that and see the impact on perf.
11727ab
to
145c236
Compare
Now that I have gotten better at haskell 🚀, I solved the issue and we don't need an "approximation". We CAN calculate full cache size in bytes. |
4b1afd1
to
d9de43a
Compare
Load test and memory test failing with:
Is there any additional configuration that needs to be added for profiling? |
nix/overlays/haskell-packages.nix
Outdated
# nixpkgs have ghc-datasize-0.2.7 marked as broken | ||
ghc-datasize = lib.markUnbroken prev.ghc-datasize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# nixpkgs have ghc-datasize-0.2.7 marked as broken | |
ghc-datasize = lib.markUnbroken prev.ghc-datasize; | |
# TODO: Remove this once https://github.com/NixOS/nixpkgs/pull/375121 | |
# has made it to us. | |
ghc-datasize = lib.markUnbroken prev.ghc-datasize; |
Does it happen locally, too? I am confused, can't exactly spot what's wrong right now. |
Yes, it does happen locally.
|
d9de43a
to
7ff02db
Compare
7ff02db
to
6eb7d3c
Compare
Need some way to run |
Yeah, running the loadtest in CI is not working when dependencies are changed, because it needs to run against the base branch, which doesn't have the dependencies, yet... and then cabal just breaks it somehow. You can run something like this locally to get the same markdown output:
(but it's likely that it fails the same way... :D) |
The thing I don't understand about this error message is, that it appears in contexts where we don't use profiling libs. We do for the memory test, so the error message makes "kind of sense" (I still don't understand why it happens, though). But for loadtest and the regular build on MacOS... those don't use profiling. Hm... actually - we don't do a regular dynamically linked linux build via nix, I think. So in fact it fails for every nix build except the static build. Still don't know what's happening, though. |
Add metric
pgrst_jwt_cache_size
in admin server which shows the cache size in bytes.