Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit telemetry event for bundle init #2037

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Dec 20, 2024

Changes

This PR adds a telemetry event for the bundle init command, which is emitted on every bundle init command invocation.

This PR also adds a telemetry logger library that allows us to log arbitrary events as long as a corresponding proto exists in the universe.

We add a timeout of 3 seconds for logging the telemetry. Manual benchmarks show that that API requests can take anywhere from 1-3 seconds to finish. We start here with a pragmatic approach and can improve this it of the UX in either of two ways:

  1. Add a new command to publish telemetry, and spawn a new process so that the main process can exit without waiting for the telemetry to publish.
  2. Maintain a warm TCP connection so that logging the telemetry will only take the round trip time and avoid the overhead associated with

Tests

Integration/unit tests.
They ensure that we do not log PII.

Manually tested there are no (obvious) regressions in the UX and the telemetry event is being sent successfully:

15:05:14 DEBUG POST /telemetry-ext
> {
>   "items": null,
>   "protoLogs": [
>     "{\"frontend_log_event_id\":\"da8a0bc2-9adc-4fc2-89a5-2b0ff802878c\",\"entry\":{\"databricks_cli_log\":{\"... (225 more bytes)"
>   ],
>   "uploadTime": 1735292112
> }
< HTTP/2.0 200 OK
< {
<   "errors": null,
<   "numProtoSuccess": 1,
<   "numSuccess": 0
< } pid=68190 sdk=true
15:05:14 DEBUG Successfully flushed telemetry events pid=68190
15:05:14  INFO completed execution pid=68190 exit_code=0

@shreyas-goenka shreyas-goenka changed the base branch from main to telemetry/logger-2 December 20, 2024 05:55
@shreyas-goenka shreyas-goenka changed the base branch from telemetry/logger-2 to main December 27, 2024 06:11
@shreyas-goenka shreyas-goenka changed the base branch from main to telemetry/logger-2 December 27, 2024 06:11
@shreyas-goenka shreyas-goenka changed the base branch from telemetry/logger-2 to main December 27, 2024 06:14
// Maximum additional time to wait for the telemetry event to flush. We expect the flush
// method to be called when the CLI command is about to exist, so this caps the maximum
// additional time the user will experience because of us logging CLI telemetry.
var MaxAdditionalWaitTime = 5 * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on a followup PR to try and reduce the additional timeout we wait to capture telemetry requests.

@shreyas-goenka shreyas-goenka marked this pull request as draft January 3, 2025 05:47
Copy link

github-actions bot commented Jan 3, 2025

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 2037
  • Commit SHA: aca49b2285a032235b2ef936a3392e447f5546d2

Checks will be approved automatically on success.

@pietern
Copy link
Contributor

pietern commented Jan 14, 2025

If the contents of this PR has fanned out to other PRs, could you link which ones and close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants