-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for feedback: Add new fixed-lifespan cloud retention strategy. #96
base: master
Are you sure you want to change the base?
Conversation
The fixed-lifespan strategy provisions an agent for a fixed amount of time. After the amount of time has expired, the agent is disconnected, any builds finished, then the agent is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall an interesting idea.
I've made a few coding-style comments but my main issue is that I think it would be better still if a user didn't have to choose between this retention strategy and the until-idle one but could, instead, have the best of both worlds and have "until idle for over X minutes, or until Y minutes has elapsed" instead.
e.g. where I work, I'd want to retain VMs until they've been idle for an hour or until they were over two days old.
i.e. it might be better to add this "maximum lifespan" functionality to the existing until-idle retention strategy.
try { | ||
c.disconnect(cause).get(); | ||
} catch (InterruptedException | ExecutionException e) { | ||
LOGGER.log(WARNING, "Failed to disconnect " + cname, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails to disconnect, it looks like setAtEndOfLife()
will still be set so that no further attempts will be made to disconnect the slave (until Jenkins gets restarted).
That doesn't seem right to me.
} | ||
} | ||
} | ||
return 1; // re-check in 1 minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a machine has reached end-of-life then re-checking every minute is good.
However, before it's reached end-of-life, we don't need to check every minute - we should return "number of minutes until we reach end of line".
return !isAtEndOfLife(); | ||
} | ||
|
||
private transient boolean atEndOfLife; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of having class fields scattered throughout the code - I much prefer to see all the fields at the top.
|
||
private transient boolean atEndOfLife; | ||
|
||
private synchronized boolean isAtEndOfLife() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that synchronized
here is going to achieve thread-safety.
e.g. the check
method calls isAtEndOfLife()
twice, so if you only lock in here, check
might get two different values when it calls the method if another thread has called setAtEndOfLife()
in between.
You probably want to make check()
be synchronized
(or add a synchronized
section within it) so that you can ensure that you get a consistent and controlled logic flow within that method.
...also, if you put it there, you can reduce the number of synchronized blocks to just one - thread synchronization is expensive so we don't want to do that any more than we absolutely have to.
...and I'd suggest that you check with other retention strategies (in this and other plugins) and verify that it's needed at all because, if we don't need to make this code thread-safe, we shouldn't add code making it look like it is.
<f:entry title="${%Lifespan Timeout}" field="lifespanMinutes"> | ||
<f:number default="60"/> | ||
</f:entry> | ||
</j:jelly> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a linefeed to the end of the file.
@@ -1 +1,2 @@ | |||
runOnceCloudRetentionStrategy.OfflineReason.BuildHasRun=VSphere Cloud Slave configured was to run one build only | |||
fixedLifespanCloudRetentionStrategy.OfflineReason.LifespanReached=VSphere Cloud Agent has reached the end of its configured lifespan of {0} minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a linefeed to the end of the file.
This PR has been inactive for some time now... I'm not keen on having PRs open for ages - I'd prefer that we either work on them until they're complete and get them merged in, or we close them if that's not going to happen. |
We have found a need for an alternate cloud retention strategy for our builds.
The keep-until-idle strategy is very good, but the issue we hit is that our builds
are continuously running so agent never go idle. Eventually, the agents will hit
some issue we must address, such as running out of disk space. While we could
work to fix the agents, it's much easier to just delete them and let them be recreated.
So, we have introduced a fixed-life strategy to the vSphere plugin.
The fixed-lifespan strategy provisions an agent for a fixed amount of time.
After the amount of time has expired, the agent is disconnected, any builds
finished, then the agent is removed. This prevents incurring the costs of the
build-once strategy, which involves bootup/provisioning/no cache costs, but
ensures the agents are recycled on a regular basis which helps prevent issues.
While, currently, we do this as a separate retention strategy, I could also see this
simply being an option that could be added to the existing keep-until-idle strategy.
I was hoping to get your thoughts, and if interested, submit this for inclusion.
Thank you.