-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to introspect failing/restarting services #104
Comments
During our discussion at the Frankfurt sprint, we discussed a few potential options, leaning towards using the existing "change" system so that things are tracked:
Or some combination of the above, for example, I currently lean towards doing both 2 and 3: adding |
Notes from meeting just now:
|
This will enable debugging/introspection on whether a service is failing and being auto-restarted. You can introspect it with the changes API or the "pebble changes" and "pebble tasks" CLI commands. The kind (type) of the changes is "recover" and the kind of the tasks is "restart". Example output while I service is begin recovered (change 3 is not yet ready): $ pebble changes ID Status Spawn Ready Summary 1 Done today at 16:08 NZST today at 16:08 NZST Autostart service "test2" 2 Done today at 16:08 NZST today at 16:09 NZST Recover service "test2" 3 Done today at 16:09 NZST - Recover service "test2" After it's recovered the change is marked done/Ready: $ pebble changes ID Status Spawn Ready Summary 1 Done today at 16:08 NZST today at 16:08 NZST Autostart service "test2" 2 Done today at 16:08 NZST today at 16:09 NZST Recover service "test2" 3 Done today at 16:09 NZST today at 16:10 NZST Recover service "test2" $ pebble tasks 2 Status Spawn Ready Summary Done today at 16:08 NZST today at 16:08 NZST Restart service "test2" Done today at 16:08 NZST today at 16:08 NZST Restart service "test2" Done today at 16:09 NZST today at 16:09 NZST Restart service "test2" $ pebble tasks 3 Status Spawn Ready Summary Done today at 16:09 NZST today at 16:09 NZST Restart service "test2" Fixes canonical#104
Per comment on #117 (comment), we're planning to do this with events (a new concept) instead, and have started spec JU048 to design that. |
This PR adds a current-since field to the API, meaning the time at which the "current" field last changed, for example started (current changed from inactive to active), and so on. It's related to #104 (though in a previous iteration it was "start time", not "curren since"). Here's how it looks like the CLI: $ pebble services Service Startup Current Since test2 enabled active today at 17:08 NZST $ pebble services --abs-time Service Startup Current Since test2 enabled active 2022-04-28T17:08:45+12:00 This PR also removes the "restarts" field which isn't being used and was internal-only. Per #104, we're going to use changes/tasks or events instead.
Similar to what we're planning with check failures - comment, where we kind of came full circle back around to Changes and Tasks ... can we figure out a way to do this with Changes and Tasks (but without creating huge numbers of changes in case of a bad failure)? |
We're going to start with doing this for health checks (spec here). This still seems useful bit is lower-priority. |
Per discussion at #86 (comment), it wasn't obvious when
ServiceInfo.Restarts
was reset to 0 (if at all), and whether this was the right thing to expose. Knowing whether Pebble has restarted 1000 times vs 1001 isn't particularly useful, for example.On the other hand, when debugging a f(l)ailing service, we want the user to be able to see how often it's restarting. We kind of want a differential, like "restarts per minute" or "restarts in last 10 minutes". But that might be just as hard to explain, so maybe the absolute number is simplest and the user can see how fast it's going up over by querying over time, or by knowing when Pebble was started.
I lean towards keeping this field -- it seems quite useful to me -- and having it never reset (until Pebble restarts). It would also be exposed in
pebble services
.The text was updated successfully, but these errors were encountered: