Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM only when user breaks a step #160

Merged
merged 8 commits into from
Oct 4, 2024

Conversation

thatportugueseguy
Copy link
Collaborator

With this change monorobot will only send one notification per failing/failed step, when a step is newly broken. If dev B's build has some failing steps from previous commit with dev A's code, dev B will not be notified. Only dev A will be notified, and only on the first time the build failed with that/those failing step(s). If there are new steps broken by dev C, while maintaining the original failed builds, dev C will only be notified about the steps they are responsible for breaking.

(Buildkite specific)
If the policy for failure is allow_once, there will be only one DM, sent after the whole build failed, and showing all the failed steps. Until these steps are fixed and broken again, there is no other notification for them.

If the policy for failure is allow, there will be two DMS with the above rules. The first will be on the notification for build is failing, the second will be on the notification for build is failed

@thatportugueseguy thatportugueseguy force-pushed the dm-only-if-user-breaks-a-step branch from 65090bf to 0fd1067 Compare September 28, 2024 17:56
lib/action.ml Outdated
Comment on lines 142 to 145
let broken_steps = Util.Build.new_failed_steps n repo_state pipeline in
match n.state = Failure && broken_steps = [] with
| true -> Lwt.return []
| false ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only change, which changed the indentation of the action_on_match fn, hence the big diff, but the function itself wasn't changed

Comment on lines +189 to +195
let full_build_failed =
n.state = Failure && Re2.matches buildkite_is_failed_re (Option.default "" n.description)
in
action_on_match branches ~notify_channels ~notify_dm
(* for failing states, if we only allow one notification, it must be the final one, not an intermediate one *)
(match n.state <> Failure || full_build_failed with
| false -> Lwt.return []
| true ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines -43 to -71

open Devkit

let fmt_error ?exn fmt =
Printf.ksprintf
(fun s ->
match exn with
| Some exn -> Error (s ^ " : exn " ^ Exn.str exn)
| None -> Error s)
fmt

let first_line s =
match String.split_on_char '\n' s with
| x :: _ -> x
| [] -> s

let decode_string_pad s = Stre.rstrip ~chars:"= \n\r\t" s |> Base64.decode_exn ~pad:false

let http_request ?headers ?body meth path =
let setup h =
Curl.set_followlocation h true;
Curl.set_maxredirs h 1
in
match%lwt Web.http_request_lwt ~setup ~ua:"monorobot" ~verbose:true ?headers ?body meth path with
| `Ok s -> Lwt.return @@ Ok s
| `Error e -> Lwt.return @@ Error e

let sign_string_sha256 ~key ~basestring =
Cstruct.of_string basestring |> Nocrypto.Hash.SHA256.hmac ~key:(Cstruct.of_string key) |> Hex.of_cstruct |> Hex.show
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved these more general/utility functions to the util.ml file as I wanted to add a function to this file, but it would cause cyclic dependencies. And these functions actually belong more in a util file than here, so I took the opportunity to move them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only has formatting changes, apart from changing the policy for failure notifications from allow_once to allow so that we could still test the notifications for failing builds. Otherwise, we don't notify on failing builds, only on failed ones

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have updated the tests for failure status notifications and added a couple of extra ones. Cases tested:

  • one failing step
  • multiple failing steps at the same time
  • someone breaks a new step while the previous steps keep failing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this test case is really needed. Unless I'm missing something it tests the same thing as status.commit1-02-failed, but it was doing it before already, so I'm wondering if I might have overlooked something?

@rr0gi
Copy link
Contributor

rr0gi commented Sep 30, 2024

ftr i want to know when my build is broken regardless whether it is my fault or not (ie exception should be configurable per user)

@rr0gi
Copy link
Contributor

rr0gi commented Sep 30, 2024

also i think allow_once needs some renotification or escalation policy

@rr0gi
Copy link
Contributor

rr0gi commented Sep 30, 2024

maybe better to send dm to commit author anyway, just word it differently (build failed, but it is likely not your fault because build for step X was broken first in commit Y etc)

@thatportugueseguy
Copy link
Collaborator Author

also i think allow_once needs some renotification or escalation policy

I'm not sure I understand what you mean? Do you mean escalating to a message in a channel or DMing again if that step is broken for a long time?

@thatportugueseguy
Copy link
Collaborator Author

maybe better to send dm to commit author anyway, just word it differently (build failed, but it is likely not your fault because build for step X was broken first in commit Y etc)

I have updated the messages and added the link to the commit so that it's easy to see the code and the author, as well as the build logs. If it's justified, maybe we can add mentions later, but I wanted to move away from all that noise in the notification.

some screenshots:
Screenshot 2024-10-02 at 18 53 35
Screenshot 2024-10-02 at 18 52 35
Screenshot 2024-10-02 at 18 53 13

@thatportugueseguy
Copy link
Collaborator Author

also i think allow_once needs some renotification or escalation policy

maybe some kind of cron checking if each build has been failing for more than x mins and dm the original committer? I think this can/should be done one another PR

@thatportugueseguy
Copy link
Collaborator Author

I'm going to merge this PR as I think it's fit to merge and we can look into the escalation mechanism after we test it.

@thatportugueseguy thatportugueseguy merged commit 8774ef9 into master Oct 4, 2024
0 of 3 checks passed
@thatportugueseguy thatportugueseguy deleted the dm-only-if-user-breaks-a-step branch October 4, 2024 10:09
thatportugueseguy added a commit that referenced this pull request Oct 8, 2024
thatportugueseguy added a commit that referenced this pull request Oct 8, 2024
* origin/master:
  Revert "DM only when user breaks a step (#160)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants