Handle steps state per build #167

thatportugueseguy · 2024-12-12T18:31:55Z

Description of the task

This PR changes the way monorobot stores state for the failed builds. We now store states per build per branch individually and calculate notifications with that information rather than having state per build step but shared across the branch for all the builds at the same time.

Some tests were updated and now have a different output because they weren't correct before.

I have left out logic for handling long running builds that finish later than subsequent shorter ones. For now we just ignore them, as I think this will not be a common issue and the PR is quite big in changes already. It's better to fix later when we have a better understanding of how those long running builds will impact the notifications.

yasunariw

Please update docs with build tracking logic (with basic examples)
I didn't follow the updates to the tests, too many unrelated changes

yasunariw · 2024-12-17T05:12:52Z

lib/context.ml

@@ -76,13 +76,13 @@ let hook_of_channel ctx channel_name =
 (** [is_pipeline_allowed ctx repo_url ~pipeline] returns [true] if [status_rules]
    doesn't define a whitelist of allowed pipelines in the config of [repo_url],
    or if the list contains [pipeline]; returns [false] otherwise. *)
-let is_pipeline_allowed ctx repo_url ~pipeline =
+let is_pipeline_allowed ctx repo_url ~context =


what are ctx and context? also the comment appears to be outdated

ctx is of Context.t type and context is the notification context value. I didn't realize that the two arguments had the same name, this was part of changes to make things clearer regarding the usage of the notification.context and the pipeline name value.

I've updated to use the notification instead and access the context value inside.

re the comments, they are correct, it's just that the match case was written in the negative form. I've updated it to be in the positive.

thanks, yes I mean the part of the comment that refers to the old arg ~pipeline

yasunariw · 2024-12-17T05:14:38Z

lib/slack.ml

-        Printf.sprintf "<%s|%s> " l step
+      let slack_step_link (s : State_t.failed_step) =
+        let step = Stre.drop_prefix s.name (pipeline ^ "/") in
+        Printf.sprintf "<%s|%s> " s.build_url step


Printf is opened at top level

yasunariw · 2024-12-17T05:18:41Z

lib/util.ml

+    let rec get_name context build_url =
+      match String.length context with
+      | 0 -> Error "failed to get pipeline name from notification - empty string"
+      | _ when Devkit.Stre.exists build_url (context ^ "/") ->


Devkit is opened in scope

yasunariw · 2024-12-17T05:19:24Z

lib/util.ml

+    | Ok { pipeline_name; _ } ->
+      (* build urls are in the form of .../<base_pipeline_name>/builds/<build_number>.
+         Pipeline steps have an html anchor after the build number *)
+      let re = Re2.create_exn (Printf.sprintf ".*/%s/builds/(\\d+)#?" pipeline_name) in


we shouldn't be compiling the regex for every event

ideally, not, but this regex depends on the pipeline name, so it can't be compiled ahead of time, unfortunately. The option would be to create a map of regexes and only create on the first time we encounter a pipeline, but that feels a bit overkill, we don't seem to have any slowness due to this here.

Another option is to create a map with only the allowed pipelines at startup. So if the pipeline in the notification isn't allowed, it would fail here. But again, I'm not sure we are experiencing a downgrade in performance here.

WDYT?

I'm still not sure why we need both the context and build_url to find the build number

for buildkite, url has predictable structure of

https://buildkite.com/<org_name>/<pipeline_name>/builds/<build_id>#<job_id>

where #<job_id> is optional

default context also has predictable structure (though can be customized)

buildkite/<pipeline_name>/<job_name>

where /<job_name> is optional

So it seems to me like we can extract the <build_id> from the url in one line
and pipeline_check from the context in one line

And these assumptions about structure are meaningless outside of BK

So echoing my other comment, I suggest to

detect provider from status notification (currently only support BK, detect by buildkite/ prefix in context)

use method defined for provider to extract pipeline metadata from notification, handle the "smart" way that is the focus of this PR

if unknown (unsupported) provider, handle the default way

Then in 2. we only care about BK, so parsing is simple, can expect predefined structure

You are right, i didn't want to go this way straight away as we had some support for other providers, but it doesn't make sense to try to support anything that might come our way. I've updated these to support primarily buildkite and have some other default way for other CIs.

Thanks

yasunariw · 2024-12-17T05:25:26Z

lib/util.ml

+      (match Re2.find_submatches_exn re build_url with
+      | [| _; Some build_number |] -> Ok build_number
+      | _ | (exception _) ->
+        (* we either match on the first case or get an exception. *)
+        Error "failed to get build number from url")


Suggested change

(match Re2.find_submatches_exn re build_url with

| [| _; Some build_number |] -> Ok build_number

| _ | (exception _) ->

(* we either match on the first case or get an exception. *)

Error "failed to get build number from url")

(match Re2.find_first_exn ~sub:(`Index 1) re build_url with

| build_number -> Ok build_number

| exception _ -> Error "failed to get build number from url")

yasunariw · 2024-12-17T05:36:12Z

lib/util.ml

  let buildkite_is_failed_re = Re2.create_exn {|^Build #\d+ failed|}

+  let parse_context ~context ~build_url =


why do we return a result type if the error message is ignored everywhere

This and get_build_number below do a lot of string manipulation just to obtain the pipeline name. Can it be simpler? It reads like the function wants to target Buildkite mainly but also somewhat support status contexts in general.
Instead if there is an interface to extract the necessary build info, and we implement a Buildkite module that programs to that interface, then we expect a predictable structure for both build context and url, and can extract the info in 3 lines instead of 30

yasunariw · 2024-12-17T06:06:51Z

lib/util.ml

+              | false -> build_status.failed_steps @ acc
+            in
+            let previous_failed_steps =
+              StringMap.fold (to_previous_failed_steps @@ int_of_string current_build_number) builds_maps []


current_build_number should return an int (as the function name suggests) to avoid this adhoc conversion

i thought about it, but in the end the build number needs to be a string to be the key in the state json file (i.e., the build number key in the build_statuses map), and we'd get a string from the regex anyway, so these two need to convert at some point.

i didn't want to go with an IntMap wrapper module that needs to convert all the keys every time we write to disk, but looking at the actual state in the deployed app, there are very few builds stored at the same time, so it shouldn't be too much of an overhead. I'll add the IntMap wrapper module

yasunariw · 2024-12-17T06:10:04Z

lib/state.ml

+    let context = n.context in
+    let { Util.Build.is_pipeline_step; pipeline_name } = Util.Build.parse_context_exn ~context ~build_url in
+    let build_number = Util.Build.get_build_number_exn ~context ~build_url in
+    let is_finished = (not is_pipeline_step) && (n.state = Success || n.state = Failure || n.state = Error) in


pattern match

yasunariw · 2024-12-17T06:12:40Z

lib/state.ml

+  let repo_state = find_or_add_repo' state n.repository.url in
+  let pipeline =
+    (* We only need to track messages for the base pipeline, not the steps *)
+    match Util.Build.parse_context ~context:n.context ~build_url:(Option.default "" n.target_url) with


iiuc calling the function with build_url as empty string will never succeed

it shouldn't happen in buildkite, but for other CIs this value might be empty. See this case.

So here we protect against it.

I think it's better if we don't intentionally evaluate parse_context with an argument we know is invalid (creates more assumptions for the callee to manage)

creates more assumptions for the callee to manage

Makes sense. Good call, thanks

yasunariw · 2024-12-17T06:33:04Z

lib/util.ml

+              (* When this function is called, the current build is finished *)
+              Option.get @@ StringMap.find_opt current_build_number builds_maps


then the comment should be part of the exception raised when it's None

This is already the case above at the beginning of the new_failed_steps function.

The comment is only here to assure the reader that we are not getting an exception here. i will update the comment to make it clearer

then maybe just StringMap.find with comment

yasunariw · 2024-12-18T02:37:08Z

lib/context.ml

@@ -76,13 +76,13 @@ let hook_of_channel ctx channel_name =
 (** [is_pipeline_allowed ctx repo_url ~pipeline] returns [true] if [status_rules]
    doesn't define a whitelist of allowed pipelines in the config of [repo_url],
    or if the list contains [pipeline]; returns [false] otherwise. *)
-let is_pipeline_allowed ctx repo_url ~pipeline =
+let is_pipeline_allowed ctx repo_url ~context =


thanks, yes I mean the part of the comment that refers to the old arg ~pipeline

yasunariw · 2024-12-18T02:39:18Z

lib/state.ml

+  let repo_state = find_or_add_repo' state n.repository.url in
+  let pipeline =
+    (* We only need to track messages for the base pipeline, not the steps *)
+    match Util.Build.parse_context ~context:n.context ~build_url:(Option.default "" n.target_url) with


I think it's better if we don't intentionally evaluate parse_context with an argument we know is invalid (creates more assumptions for the callee to manage)

yasunariw · 2024-12-18T03:09:54Z

lib/util.ml

+    | Ok { pipeline_name; _ } ->
+      (* build urls are in the form of .../<base_pipeline_name>/builds/<build_number>.
+         Pipeline steps have an html anchor after the build number *)
+      let re = Re2.create_exn (Printf.sprintf ".*/%s/builds/(\\d+)#?" pipeline_name) in


I'm still not sure why we need both the context and build_url to find the build number

for buildkite, url has predictable structure of

https://buildkite.com/<org_name>/<pipeline_name>/builds/<build_id>#<job_id>

where #<job_id> is optional

default context also has predictable structure (though can be customized)

buildkite/<pipeline_name>/<job_name>

where /<job_name> is optional

So it seems to me like we can extract the <build_id> from the url in one line
and pipeline_check from the context in one line

And these assumptions about structure are meaningless outside of BK

So echoing my other comment, I suggest to

detect provider from status notification (currently only support BK, detect by buildkite/ prefix in context)

use method defined for provider to extract pipeline metadata from notification, handle the "smart" way that is the focus of this PR

if unknown (unsupported) provider, handle the default way

Then in 2. we only care about BK, so parsing is simple, can expect predefined structure

yasunariw · 2024-12-18T03:11:06Z

lib/util.ml

+              (* When this function is called, the current build is finished *)
+              Option.get @@ StringMap.find_opt current_build_number builds_maps


then maybe just StringMap.find with comment

yasunariw · 2024-12-18T03:12:35Z

mock_states/status.commit1-02-failed.json

@@ -1,24 +1,31 @@
 {
  "pipeline_statuses": {
-    "buildkite/pipeline2/failed-step": {
+    "pipeline2": {


not sure it's a good idea to lose the buildkite/ prefix from the state, it is useful to know that it is a BK pipeline (and imagine in the future the monorepo has two providers with pipelines both named monorepo)

The intent was to separate the pipeline name from the provider context, but it is more future proof to keep it, indeed. I will add this change to the backlog, as I have other tasks that need to be more prioritary at the moment.

thatportugueseguy added 5 commits December 6, 2024 12:43

add builds map to pipeline state

cb21c1e

cleanup and handle new state structure

b65b355

Update and clean state management

87d8194

handle out-of-order first notifications

308a87d

fix tests

9cc9923

thatportugueseguy requested review from Khady and yasunariw December 12, 2024 18:31

small cleaning of typos

90f21e6

thatportugueseguy merged commit d2f47e3 into master Dec 16, 2024
2 of 3 checks passed

thatportugueseguy deleted the step-state-per-build branch December 16, 2024 16:15

yasunariw reviewed Dec 17, 2024

View reviewed changes

yasunariw reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle steps state per build #167

Handle steps state per build #167

thatportugueseguy commented Dec 12, 2024

yasunariw left a comment

yasunariw Dec 17, 2024

thatportugueseguy Dec 17, 2024

yasunariw Dec 18, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

thatportugueseguy Dec 17, 2024

yasunariw Dec 18, 2024

thatportugueseguy Dec 18, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

thatportugueseguy Dec 18, 2024

yasunariw Dec 17, 2024

yasunariw Dec 17, 2024

thatportugueseguy Dec 17, 2024

yasunariw Dec 18, 2024 •

edited

Loading

thatportugueseguy Dec 18, 2024

yasunariw Dec 17, 2024

thatportugueseguy Dec 17, 2024

yasunariw Dec 18, 2024

yasunariw Dec 18, 2024

yasunariw Dec 18, 2024 •

edited

Loading

yasunariw Dec 18, 2024

yasunariw Dec 18, 2024

yasunariw Dec 18, 2024

thatportugueseguy Dec 18, 2024

		let buildkite_is_failed_re = Re2.create_exn {\|^Build #\d+ failed\|}

		let parse_context ~context ~build_url =

		(* When this function is called, the current build is finished *)
		Option.get @@ StringMap.find_opt current_build_number builds_maps

Handle steps state per build #167

Handle steps state per build #167

Conversation

thatportugueseguy commented Dec 12, 2024

Description of the task

yasunariw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasunariw Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasunariw Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yasunariw Dec 18, 2024 •

edited

Loading

yasunariw Dec 18, 2024 •

edited

Loading