-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nodes pre-provisioning #127
base: master
Are you sure you want to change the base?
Conversation
In order to have instances at the ready at all times and thus speed up CI runs because instance provisioning is done upfront, add a mechanism and the related parameter to configure a minimum number of instances that must be ready at all times. Also add a mechanism to reconcile total number of nodes based on template/cloud capacity There is similar functionality in openstack cloud plugin: https://github.com/jenkinsci/openstack-cloud-plugin/blob/master/plugin/src/main/java/jenkins/plugins/openstack/compute/JCloudsPreCreationThread.java
Thanks for writing this - it's functionality that I approve of 👍 However, I'm not currently in a position to review this in detail right now, so I'll stick to general points:
WebUI: use help pages rather than description attributes. I find that description text bloats the (already long) page more than having a help icon on the right ... and this functionality could do with a big explanation of what it does so the help text would be a perfect place for that. VSphereCloudRetentionStrategy.check might be better if it did it's thing and then, if the node isn't needed, delegated to super.check rather than copy/paste what super.check currently does ... or else, why not do what RunOnce does and call done(c)? If there is a good reason why it has to be different, please comment why. Unit tests would be appreciated; I know we can't test against a real vSphere instance, but we can mock things and check the code is requesting the changes we expect. |
* less 'public' : either @restricted(NoExternalUse) or protected * add help pop up message to the web ui * get rid of copy pasting when override retention strategy * remove unused method (leftover)
Hi @pjdarton! Could you take a look once again at the code and see if something is still too 'public'? I am not very good at Java, so I could have missed a thing. Thanks a lot in advance! Regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. All the newly-added public stuff now has @Restricted
on it which is good.
WebUI resource files need tweaking.
Disclaimer: I've not tried this out myself yet; I'm just looking at the code changes on GitHub. I will (eventually) get around to trying it at work (and if it works, I'll make use of it right away!) but I'm not going to merge it until I've seen it working myself "for real".
src/main/resources/org/jenkinsci/plugins/vSphereCloudSlaveTemplate/config.jelly
Outdated
Show resolved
Hide resolved
src/main/resources/org/jenkinsci/plugins/vSphereCloudSlaveTemplate/help-instancesMin.html
Outdated
Show resolved
Hide resolved
Note: if the Jenkins test fails and you don't think it's your fault, you should be able to get GitHub to re-run the test. |
Hey @pjdarton! Glad you are thrilled to get this functionality added. I agree that more docs is better than less. |
cd3360b
to
1656dfc
Compare
5ff34be
to
256eaf4
Compare
Hey @pjdarton! Hope you are doing well, mate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Boolean
vs boolean
is a simple coding error and easily fixed.
I suspect that many new methods are more visible than they need to be. public
should be the last-resort; default visibility is package-level and as most of the plugin's code is all in the same package that's often all that's needed. protected
is generally only used to give access to subclasses.
...but we do need to ensure that all new non-private code is @Restricted(NoExternalUse.class)
unless it's specifically part of the public API (so we don't restrict access to the new getter & setter fot the new field, but all the internal workings should be restricted)
I'm concerned that the "retain VM" functionality will actually defeat / prevent-from-working existing retention-strategy functionality that kills off VMs when they're no longer useful.
It may be better to have this functionality create new VMs to fulfil the instanceMin requirement but allow them to time out normally, instructing users 9in the documentation) that if they don't want instances to time out quickly like that then they'll need to increase the retention time.
The existing "if instancesPresent < instanceMin then ignore the retention strategy and keep it" logic check seems to be too simplistic.
Dealing with the asynchronicity of retention-strategy checks and the instanceMin provisioning background task appears to be improperly handled. I suspect that you'll need some more complicated state handling for this to work properly.
Consider having the periodic task evaluate which instances ought to be kept, and in what priority order, and have the retention strategies then act on that evaluation result "when they get called" ... or (and I suspect this will be the much simpler option) have the periodic task only be responsible for creating new instances if the minimum isn't met yet and let the retention strategies take care of disposing of them.
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/RunOnceCloudRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/RunOnceCloudRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereCloudRetentionStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereNodeReconcileWork.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveComputer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveComputer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
* Kill shouldSlaveBeRetained() func * Undo changes to ret policies * minor fixes
@pjdarton I have killed Regarding your question on using package-level visibility: I don't think it is possible. |
I am sorry for the belated response. |
@@ -215,7 +217,8 @@ public int getInstanceCap() { | |||
return this.templates; | |||
} | |||
|
|||
private vSphereCloudSlaveTemplate getTemplateForVM(final String vmName) { | |||
@Restricted(NoExternalUse.class) | |||
public vSphereCloudSlaveTemplate getTemplateForVM(final String vmName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that) This method is only called from elsewhere within this class and the neighbouring vSphereCloudSlaveTemplate class; they're both in the same package.
I believe, therefore, that it doesn't need to be public and that it can be set to default visibility.
i.e.
@Restricted(NoExternalUse.class)
vSphereCloudSlaveTemplate getTemplateForVM(final String vmName) {
@@ -133,6 +133,12 @@ public Integer getLimitedTestRunCount() { | |||
return LimitedTestRunCount; | |||
} | |||
|
|||
@Restricted(NoExternalUse.class) | |||
public vSphereCloudSlaveTemplate getTemplate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this method is used. I'm looking using my browser on github, and it says that "getTemplate(" only occurs here, where it's declared, and nowhere else in this PR.
If that's the case, it can/should be removed.
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveComputer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereNodeReconcileWork.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereNodeReconcileWork.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereNodeReconcileWork.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/vsphere/VSphereNodeReconcileWork.java
Outdated
Show resolved
Hide resolved
@pjdarton thanks for another iteration on this PR. I appreciate it. Regarding the major things:
I would love to hear your opinion before I start implementing it. Also, current combination of provisioning things the simple way (as I do now) + deletion of redundant nodes (using periodic job), kind of makes sure that we meet the instancesMin number, but of course there is a risk of creating excessive VMs that will be killed on the second run of the periodic job. |
Re: hint on how do you envision the implementation A cloud plugin's method This PR is adding a new method that'll provide more nodes; it'll need to iterate over all the templates checking that there's enough + to satisfy the template's instancesMin field. In this new case, we don't want to call So, I think that what we do is
...and be careful not to change the behaviour of the original |
5ef81ba
to
d843288
Compare
@pjdarton It's been a while... I am sorry again. Ever-changing priorities are ruinning plans sometimes. I am willing to address further comments asap. |
} | ||
maxSlavesToProvisionBeforeCloudCapHit = Integer.valueOf(intValue - 1); | ||
} | ||
if (!cloudHasCapacity()) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not subtracting 1 here since maxSlavesToProvisionBeforeCloudCapHit
is re-calculated on every call to cloudHasCapacity()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood - done like this (re-calculating every time) is not as CPU-efficient as it was, but I guess the additional workload is insigificant compared to everything else.
* Use native logic that takes into account state of the nodes
d843288
to
3fe45d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a unit-test for the new method that's been added to CloudProvisioningAlgorithm (that code can be tested in isolation and has good coverage)
I would like to have a unit-test for the new provisoning code too - some test scenarios where the auto-provisioning is called (I think we can assume Jenkins will call it and don't need to test that Jenkins does so) with different situatuons and check that it decides (correctly) what to provision (stubbing/mocking out the bits that'd talk to a vSphere instance).
e.g. "if we've got no instances, no instance limit and instanceMin is 2 then we expect two instances to be provisioned" and "if we've got 2 instances, no instance limit, and instanceMin is 2 then we expect none to be provisioned" etc.
IME unit-tests are a good way of flushing out "out by 1" mistakes (that are hard to spot "in the real world") and they also help protect functionality from getting broken by later changes (changes done by people who don't use this feature and hence won't notice if they break it unless there's a unit test that tells them)
int provisionedNodes = record.getCurrentlyProvisioned().size() + record.getCurrentlyPlanned().size(); | ||
int requiredPreProvisionedNodes = record.getTemplate().getInstancesMin(); | ||
return requiredPreProvisionedNodes - provisionedNodes; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to need a unit test for this new method.
} | ||
maxSlavesToProvisionBeforeCloudCapHit = Integer.valueOf(intValue - 1); | ||
} | ||
if (!cloudHasCapacity()) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood - done like this (re-calculating every time) is not as CPU-efficient as it was, but I guess the additional workload is insigificant compared to everything else.
public void execute(TaskListener listener) { | ||
for (Cloud cloud : Jenkins.getActiveInstance().clouds) { | ||
if (!(cloud instanceof vSphereCloud)) continue; | ||
for (vSphereCloudSlaveTemplate template : ((vSphereCloud) cloud).getTemplates()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than casting ((vSphereCloud)cloud)
every time, cast once into a variable and use that.
e.g.
if (!(cloud instanceof vSphereCloud)) continue;
vSphereCloud vsCloud = (vSphereCloud) cloud;
for (vSphereCloudSlaveTemplate template : vsCloud .getTemplates()) {
Casts are ugly - they're best done as little as possible 😉
assertThat(stubVSphereTemplateState.countNodes(), equalTo(1)); | ||
|
||
// Here it says that there still should be provisioned 2 nodes, despite the fact there is 1 active already | ||
stubVSphereCloud.preProvisionNodes(template); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs:
Jun 01, 2021 5:54:23 PM org.jenkinsci.plugins.vSphereCloud InternalLog
INFO: STARTING VSPHERE CLOUD
Jun 01, 2021 5:54:23 PM org.jenkinsci.plugins.vsphere.tools.CloudProvisioningState logStateChange
WARNING: Marking templateCapacity10instanceMin21 as active : wasPreviouslyPlanned!=true
Jun 01, 2021 5:54:24 PM org.jenkinsci.plugins.vSphereCloud preProvisionNodes
INFO: preProvisionNodesForTemplate(null): should pre-provision 2 nodes
Jun 01, 2021 5:54:24 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
INFO: There are 0 VMs in this cloud. The instance cap for the cloud is 100, so we have room for more
Jun 01, 2021 5:54:24 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
INFO: There are 1 VMs in this cloud. The instance cap for the cloud is 100, so we have room for more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, that when preProvision
is invoked it actually sees that "There are 1 VMs in this cloud", which was "provisioned" by the method just a moment ago.
So, I would love to achieve the same state without calling preProvision()
; I've tried marking the associated record as provisionedSlaveNowActive
or provisioningStarted
, but that doesn't do the trick, neither calling VSpherePlannedNode.createInstance(stubVSphereTemplateState, nodeName, provisionable);
Hey @pjdarton ! Thanks for the comments :) I couldn't find a way to convince |
Hey @pjdarton! If I remove "NodePreProvisionTest.java" do you think this could be landed? |
In order to have instances at the ready at all times and
thus speed up CI runs because instance provisioning is done
upfront, add a mechanism and the related parameter to configure
a minimum number of instances that must be ready at all times.
Also add a mechanism to reconcile total number of nodes
based on template/cloud capacity
There is similar functionality presented in openstack cloud plugin:
https://github.com/jenkinsci/openstack-cloud-plugin/blob/master/plugin/src/main/java/jenkins/plugins/openstack/compute/JCloudsPreCreationThread.java