-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle steps state per build #167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update docs with build tracking logic (with basic examples)
I didn't follow the updates to the tests, too many unrelated changes
@@ -76,13 +76,13 @@ let hook_of_channel ctx channel_name = | |||
(** [is_pipeline_allowed ctx repo_url ~pipeline] returns [true] if [status_rules] | |||
doesn't define a whitelist of allowed pipelines in the config of [repo_url], | |||
or if the list contains [pipeline]; returns [false] otherwise. *) | |||
let is_pipeline_allowed ctx repo_url ~pipeline = | |||
let is_pipeline_allowed ctx repo_url ~context = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are ctx
and context
? also the comment appears to be outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx
is of Context.t
type and context
is the notification context value. I didn't realize that the two arguments had the same name, this was part of changes to make things clearer regarding the usage of the notification.context
and the pipeline name value.
I've updated to use the notification instead and access the context value inside.
re the comments, they are correct, it's just that the match case was written in the negative form. I've updated it to be in the positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, yes I mean the part of the comment that refers to the old arg ~pipeline
Printf.sprintf "<%s|%s> " l step | ||
let slack_step_link (s : State_t.failed_step) = | ||
let step = Stre.drop_prefix s.name (pipeline ^ "/") in | ||
Printf.sprintf "<%s|%s> " s.build_url step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printf is opened at top level
let rec get_name context build_url = | ||
match String.length context with | ||
| 0 -> Error "failed to get pipeline name from notification - empty string" | ||
| _ when Devkit.Stre.exists build_url (context ^ "/") -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devkit is opened in scope
| Ok { pipeline_name; _ } -> | ||
(* build urls are in the form of .../<base_pipeline_name>/builds/<build_number>. | ||
Pipeline steps have an html anchor after the build number *) | ||
let re = Re2.create_exn (Printf.sprintf ".*/%s/builds/(\\d+)#?" pipeline_name) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't be compiling the regex for every event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally, not, but this regex depends on the pipeline name, so it can't be compiled ahead of time, unfortunately. The option would be to create a map of regexes and only create on the first time we encounter a pipeline, but that feels a bit overkill, we don't seem to have any slowness due to this here.
Another option is to create a map with only the allowed pipelines at startup. So if the pipeline in the notification isn't allowed, it would fail here. But again, I'm not sure we are experiencing a downgrade in performance here.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure why we need both the context and build_url to find the build number
for buildkite, url has predictable structure of
https://buildkite.com/<org_name>/<pipeline_name>/builds/<build_id>#<job_id>
where #<job_id>
is optional
default context also has predictable structure (though can be customized)
buildkite/<pipeline_name>/<job_name>
where /<job_name>
is optional
So it seems to me like we can extract the <build_id>
from the url in one line
and pipeline_check
from the context in one line
And these assumptions about structure are meaningless outside of BK
So echoing my other comment, I suggest to
- detect provider from status notification (currently only support BK, detect by
buildkite/
prefix in context) - use method defined for provider to extract pipeline metadata from notification, handle the "smart" way that is the focus of this PR
- if unknown (unsupported) provider, handle the default way
Then in 2. we only care about BK, so parsing is simple, can expect predefined structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, i didn't want to go this way straight away as we had some support for other providers, but it doesn't make sense to try to support anything that might come our way. I've updated these to support primarily buildkite and have some other default way for other CIs.
Thanks
(match Re2.find_submatches_exn re build_url with | ||
| [| _; Some build_number |] -> Ok build_number | ||
| _ | (exception _) -> | ||
(* we either match on the first case or get an exception. *) | ||
Error "failed to get build number from url") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(match Re2.find_submatches_exn re build_url with | |
| [| _; Some build_number |] -> Ok build_number | |
| _ | (exception _) -> | |
(* we either match on the first case or get an exception. *) | |
Error "failed to get build number from url") | |
(match Re2.find_first_exn ~sub:(`Index 1) re build_url with | |
| build_number -> Ok build_number | |
| exception _ -> Error "failed to get build number from url") |
let buildkite_is_failed_re = Re2.create_exn {|^Build #\d+ failed|} | ||
|
||
let parse_context ~context ~build_url = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we return a result type if the error message is ignored everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and get_build_number
below do a lot of string manipulation just to obtain the pipeline name. Can it be simpler? It reads like the function wants to target Buildkite mainly but also somewhat support status contexts in general.
Instead if there is an interface to extract the necessary build info, and we implement a Buildkite
module that programs to that interface, then we expect a predictable structure for both build context and url, and can extract the info in 3 lines instead of 30
| false -> build_status.failed_steps @ acc | ||
in | ||
let previous_failed_steps = | ||
StringMap.fold (to_previous_failed_steps @@ int_of_string current_build_number) builds_maps [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_build_number
should return an int (as the function name suggests) to avoid this adhoc conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought about it, but in the end the build number needs to be a string to be the key in the state json file (i.e., the build number key in the build_statuses
map), and we'd get a string from the regex anyway, so these two need to convert at some point.
i didn't want to go with an IntMap wrapper module that needs to convert all the keys every time we write to disk, but looking at the actual state in the deployed app, there are very few builds stored at the same time, so it shouldn't be too much of an overhead. I'll add the IntMap wrapper module
let context = n.context in | ||
let { Util.Build.is_pipeline_step; pipeline_name } = Util.Build.parse_context_exn ~context ~build_url in | ||
let build_number = Util.Build.get_build_number_exn ~context ~build_url in | ||
let is_finished = (not is_pipeline_step) && (n.state = Success || n.state = Failure || n.state = Error) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern match
let repo_state = find_or_add_repo' state n.repository.url in | ||
let pipeline = | ||
(* We only need to track messages for the base pipeline, not the steps *) | ||
match Util.Build.parse_context ~context:n.context ~build_url:(Option.default "" n.target_url) with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc calling the function with build_url
as empty string will never succeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't happen in buildkite, but for other CIs this value might be empty. See this case.
So here we protect against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if we don't intentionally evaluate parse_context
with an argument we know is invalid (creates more assumptions for the callee to manage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creates more assumptions for the callee to manage
Makes sense. Good call, thanks
(* When this function is called, the current build is finished *) | ||
Option.get @@ StringMap.find_opt current_build_number builds_maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the comment should be part of the exception raised when it's None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already the case above at the beginning of the new_failed_steps
function.
The comment is only here to assure the reader that we are not getting an exception here. i will update the comment to make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then maybe just StringMap.find
with comment
@@ -76,13 +76,13 @@ let hook_of_channel ctx channel_name = | |||
(** [is_pipeline_allowed ctx repo_url ~pipeline] returns [true] if [status_rules] | |||
doesn't define a whitelist of allowed pipelines in the config of [repo_url], | |||
or if the list contains [pipeline]; returns [false] otherwise. *) | |||
let is_pipeline_allowed ctx repo_url ~pipeline = | |||
let is_pipeline_allowed ctx repo_url ~context = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, yes I mean the part of the comment that refers to the old arg ~pipeline
let repo_state = find_or_add_repo' state n.repository.url in | ||
let pipeline = | ||
(* We only need to track messages for the base pipeline, not the steps *) | ||
match Util.Build.parse_context ~context:n.context ~build_url:(Option.default "" n.target_url) with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if we don't intentionally evaluate parse_context
with an argument we know is invalid (creates more assumptions for the callee to manage)
| Ok { pipeline_name; _ } -> | ||
(* build urls are in the form of .../<base_pipeline_name>/builds/<build_number>. | ||
Pipeline steps have an html anchor after the build number *) | ||
let re = Re2.create_exn (Printf.sprintf ".*/%s/builds/(\\d+)#?" pipeline_name) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure why we need both the context and build_url to find the build number
for buildkite, url has predictable structure of
https://buildkite.com/<org_name>/<pipeline_name>/builds/<build_id>#<job_id>
where #<job_id>
is optional
default context also has predictable structure (though can be customized)
buildkite/<pipeline_name>/<job_name>
where /<job_name>
is optional
So it seems to me like we can extract the <build_id>
from the url in one line
and pipeline_check
from the context in one line
And these assumptions about structure are meaningless outside of BK
So echoing my other comment, I suggest to
- detect provider from status notification (currently only support BK, detect by
buildkite/
prefix in context) - use method defined for provider to extract pipeline metadata from notification, handle the "smart" way that is the focus of this PR
- if unknown (unsupported) provider, handle the default way
Then in 2. we only care about BK, so parsing is simple, can expect predefined structure
(* When this function is called, the current build is finished *) | ||
Option.get @@ StringMap.find_opt current_build_number builds_maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then maybe just StringMap.find
with comment
@@ -1,24 +1,31 @@ | |||
{ | |||
"pipeline_statuses": { | |||
"buildkite/pipeline2/failed-step": { | |||
"pipeline2": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it's a good idea to lose the buildkite/
prefix from the state, it is useful to know that it is a BK pipeline (and imagine in the future the monorepo has two providers with pipelines both named monorepo
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was to separate the pipeline name from the provider context, but it is more future proof to keep it, indeed. I will add this change to the backlog, as I have other tasks that need to be more prioritary at the moment.
Description of the task
This PR changes the way monorobot stores state for the failed builds. We now store states per build per branch individually and calculate notifications with that information rather than having state per build step but shared across the branch for all the builds at the same time.
Some tests were updated and now have a different output because they weren't correct before.
I have left out logic for handling long running builds that finish later than subsequent shorter ones. For now we just ignore them, as I think this will not be a common issue and the PR is quite big in changes already. It's better to fix later when we have a better understanding of how those long running builds will impact the notifications.