Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: stop raising NoMethodError when processing unregistered types #209

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hx
Copy link

@hx hx commented Jan 27, 2022

We've found a lot of log noise comes from PrometheusExporter::Server::Collector#process_hash whenever #register_metric_unsafe returns nil. We recognise there are two problems there;

  1. Our collectors aren't registering properly. We're working on this.
  2. There's dissonance between the two aforementioned functions, with one not being prepared for the other to return nil.

This PR suggests a remedy for the latter.

It also raises the broader question of how this scenario could be better handled, since we've somewhat clumsily managed to demonstrate that it can occur in production code. A quick search shows 5 usages of STDERR.puts, which would ideally be handleable somehow. Some ideas would be;

  • Use exceptions to propagate errors up to the web server, and log them; or
  • Drill the web server's logger down to the objects most local to the errors.
  • Rescue exceptions in the server's main mount_proc block and send 500s or similar back. I'm not sure what the client repercussions would be there.

I'm happy to discuss options and draft alternate PRs if this one isn't the way we want to go. Someone else in my organisation did our Prometheus implementations, so I'm pretty unfamiliar with its ecosystem and with this gem, and I'm happy to be corrected if I've missed anything.

Thanks everyone!

@hx hx force-pushed the no-raise-on-unregistered-type branch from b147693 to e8a9add Compare January 27, 2022 02:20
if !metric
metric = register_metric_unsafe(obj)
end
metric = @metrics[obj["name"]] || register_metric_unsafe(obj) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| and or in the same line is terrifying. perhaps

metric = @metrics[obj["name"]] || register_metric_unsafe(obj)
if !metric
   return
end

I know it is way more verbose but it is less scary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, fair enough. How about I meet you half way?

# register_metric_unsafe will log an error if it returns nil, so that code path can end here.
return unless metric

I can leave out the comment if you prefer, but I almost always include them with early returns/guard clauses.

@SamSaffron
Copy link
Member

I feel like an exception though is correct, you asked the method to do something and it was impossible for it to register the metric.

I also agree the exception you get back is not helpful at all, perhaps the first change here is to re-wrap the exception and raise something more helpful?

@hx
Copy link
Author

hx commented Feb 1, 2022

I also agree the exception you get back is not helpful at all, perhaps the first change here is to re-wrap the exception and raise something more helpful?

The real question here is context. The exception occurs in a server handler thread, in a call stack that doesn't allow gem consumers to implement an exception handler. The gem owns the server thread, so the gem should be handling the exception. Which means the error handling mechanism is a function of the desired behaviour.

So what's the desired behaviour? I don't think crashing the server thread would be helpful. We've found these errors only occur in production, so it doesn't fit the typical pattern of letting the unhandled exceptions fail a test suite. It's also a breaking change that I imagine few would appreciate.

I suggest delegating error handling to consumers. It's a little more flexible than letting them set a logger, because the default could maintain the existing behaviour. In the abstract:

class Server
  def initialize
    @on_error = -> e { STDERR.puts e.message }
  end

  def on_error(&block)
    @on_error = block
  end
end



server = Server.new()
server.on_error { |e| Rails.logger.warn "Prometheus exporter error: #{e.message}" }

Internally, this could be implemented with a simple generic exception class (e.g. PrometheusExporter::Server::Error) that's rescued in the mount_proc block and calls @on_error.call(e).

Let me know if you'd like to explore this approach, and I'll rewrite this PR. When we're happy with the design, I'll sort out whatever's going on with Ruby 3.

Thanks @SamSaffron !

@SamSaffron
Copy link
Member

I see, yes totally, happy with the change as long as you add some interface for the consumers to get at the exceptions. Please go ahead and explore the approach, also with the PR I would love it if you can update the readme to reflect the change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants