-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor groovy script to include batching. #1008
base: master
Are you sure you want to change the base?
Conversation
The last several years, we've added batching to the this template script when executing it on the official build farms but never actually updated it here. This change includes what is essentially, a rewrite of the batch process example script to more closely match the one I've been using the last year or so. I someday hope to get even more meta about this and create a management job on Jenkins that runs variations of this script on templated job types in order to remove the need for batching and the chance for a typo when copy-pasting and modifying the example script.
|
||
The following Groovy script is a good starting point for various actions: | ||
|
||
.. code-block:: groovy | ||
|
||
import hudson.model.Cause | ||
DISTRO_LETTER = "F" | ||
PREFIXES = ["ci_", "dev_", "pr_", "rel", "src_", "bin_"].collect({pre -> DISTRO_LETTER + pre}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be overkill to add a DISTRO_LETTER constant one line before its only use but I like making the bits that are meant to be adjusted very obvious rather than just hard-coding the letter into the map/collect that generates all of the prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as an FYI, if we wanted to get just a bit fancy, we could do this at the top:
DISTRO_NAME = "foxy"
DISTRO_LETTER = DISTRO_NAME[0..0].toUpperCase()
PREFIXES = [DISTRO_NAME]
PREFIXES += ["ci_", "dev_", "pr_", "rel", "src_", "bin_"].collect({pre -> DISTRO_LETTER + pre})
That way we'll pick up the jobs that are prefixed with the DISTRO_NAME
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is also missing the doc
jobs, so it should also be added to the list above.
/* Some operations take time and an http timeout | ||
* creates performance issues with dangling script | ||
* runs. | ||
* To prevent them use a batch size that returns | ||
* fairly instant results and run the script multiple | ||
* times | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoisted this comment up into the documentation above as well. I am torn between leaving it in the script and cutting it since it's documented above.
{ | ||
if (count >= BATCH_SIZE) | ||
{ | ||
println("Reached ${BATCH_SIZE} limit before processing ${job.name}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see someone omitting the print of processed job names, which is otherwise the only feedback when processing batches that you aren't processing the same 100 jobs over again (such as if there's a mismatch between the predicate and operation) so I added this print as an indicator of where we stopped.
If I was more bored I might be tempted to implement some kind of cursor pattern.
|
||
if (count <= BATCH_SIZE) | ||
{ | ||
println("Completed execution of the last batch.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always previously just run the script until it returns 0 results but this way the script positively reports that it's processed the last batch (based on the fact that the batch is smaller than the max batch size).
It is possible if the number of jobs to process is precisely divisible by the batch size that this will print only after an empty batch, but it should still print.
} | ||
|
||
This script will print only the matched job names. | ||
You can uncomment any of the actions to disable, enable, trigger or delete these projects. | ||
You can uncomment any of the actions to disable, enable, or delete these projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the instructions for triggering builds using this script because the _trigger-jobs
jobs should cover those use cases now and there is not an entirely reasonable way to track that state using the batch setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, but wouldn't mind waiting for Scott's comments.
Some operations take time and an HTTP timeout creates performance issues with dangling script runs. | ||
To prevent them the example script uses a ``BATCH_SIZE`` constant to control how many jobs to change before returning. | ||
You can leave the default, fairly conservative batch size or increase that constant steadily until results are no longer instantaneous and then adjusting back down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, does it process jobs even if they're processed/disabled in the previous iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to say that in this context, "processing" means changing. That's what seems to take significant time and causes the script invocation to time out.
Simply inspecting all of the jobs can happen in every invocation. The continue
in the loop for each should avoid incrementing the counter if the operation would have been a no-op, so "processing" be 1-to-1 with "changing" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! @cottsay is correct here that iterating over jobs does not take much time at all but performing operations on them (enable, disable, delete) does.
So each subsequent run of the script would iterate over a growing number of (previously processed jobs) but checking the predicate is not very time intensive compared to actually performing an operation, so it does not have much of an observable effect on the return time of the script.
I resisted the urge to do even more programming and create predicate,operation pairs and setting another top-level variable so that we could actually add the operation-predicate to the filter closure that's passed to getItems
and thus only jobs that meet the predicate would be iterated over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Some operations take time and an HTTP timeout creates performance issues with dangling script runs. | ||
To prevent them the example script uses a ``BATCH_SIZE`` constant to control how many jobs to change before returning. | ||
You can leave the default, fairly conservative batch size or increase that constant steadily until results are no longer instantaneous and then adjusting back down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to say that in this context, "processing" means changing. That's what seems to take significant time and causes the script invocation to time out.
Simply inspecting all of the jobs can happen in every invocation. The continue
in the loop for each should avoid incrementing the counter if the operation would have been a no-op, so "processing" be 1-to-1 with "changing" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edits from the recent run
Groovy does not allow a closure variable to shadow another one. Co-authored-by: Dharini Dutia <[email protected]>
Co-authored-by: Dharini Dutia <[email protected]>
The last several years, we've added batching to the this template script when executing it on the official build farms but never actually updated it here.
This change includes what is essentially, a rewrite of the batch process example script to more closely match the one I've been using the last year or so.
I someday hope to get even more meta about this and create a management job on Jenkins that runs variations of this script on templated job types in order to remove the need for batching and the chance for a typo when copy-pasting and modifying the example script.