Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rhencke
Copy link

@rhencke rhencke commented Sep 20, 2018

We have found a need for an alternate cloud retention strategy for our builds.

The keep-until-idle strategy is very good, but the issue we hit is that our builds
are continuously running so agent never go idle. Eventually, the agents will hit
some issue we must address, such as running out of disk space. While we could
work to fix the agents, it's much easier to just delete them and let them be recreated.

So, we have introduced a fixed-life strategy to the vSphere plugin.
The fixed-lifespan strategy provisions an agent for a fixed amount of time.
After the amount of time has expired, the agent is disconnected, any builds
finished, then the agent is removed. This prevents incurring the costs of the
build-once strategy, which involves bootup/provisioning/no cache costs, but
ensures the agents are recycled on a regular basis which helps prevent issues.

While, currently, we do this as a separate retention strategy, I could also see this
simply being an option that could be added to the existing keep-until-idle strategy.

I was hoping to get your thoughts, and if interested, submit this for inclusion.

Thank you.

The fixed-lifespan strategy provisions an agent for a fixed amount of time.
After the amount of time has expired, the agent is disconnected, any builds
finished, then the agent is removed.
@pjdarton pjdarton added the enhancement Adds extra functionality label Oct 4, 2018
Copy link
Member

@pjdarton pjdarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall an interesting idea.
I've made a few coding-style comments but my main issue is that I think it would be better still if a user didn't have to choose between this retention strategy and the until-idle one but could, instead, have the best of both worlds and have "until idle for over X minutes, or until Y minutes has elapsed" instead.
e.g. where I work, I'd want to retain VMs until they've been idle for an hour or until they were over two days old.

i.e. it might be better to add this "maximum lifespan" functionality to the existing until-idle retention strategy.

try {
c.disconnect(cause).get();
} catch (InterruptedException | ExecutionException e) {
LOGGER.log(WARNING, "Failed to disconnect " + cname, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails to disconnect, it looks like setAtEndOfLife() will still be set so that no further attempts will be made to disconnect the slave (until Jenkins gets restarted).
That doesn't seem right to me.

}
}
}
return 1; // re-check in 1 minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a machine has reached end-of-life then re-checking every minute is good.
However, before it's reached end-of-life, we don't need to check every minute - we should return "number of minutes until we reach end of line".

return !isAtEndOfLife();
}

private transient boolean atEndOfLife;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of having class fields scattered throughout the code - I much prefer to see all the fields at the top.


private transient boolean atEndOfLife;

private synchronized boolean isAtEndOfLife() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that synchronized here is going to achieve thread-safety.
e.g. the check method calls isAtEndOfLife() twice, so if you only lock in here, check might get two different values when it calls the method if another thread has called setAtEndOfLife() in between.
You probably want to make check() be synchronized (or add a synchronized section within it) so that you can ensure that you get a consistent and controlled logic flow within that method.
...also, if you put it there, you can reduce the number of synchronized blocks to just one - thread synchronization is expensive so we don't want to do that any more than we absolutely have to.
...and I'd suggest that you check with other retention strategies (in this and other plugins) and verify that it's needed at all because, if we don't need to make this code thread-safe, we shouldn't add code making it look like it is.

<f:entry title="${%Lifespan Timeout}" field="lifespanMinutes">
<f:number default="60"/>
</f:entry>
</j:jelly>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a linefeed to the end of the file.

@@ -1 +1,2 @@
runOnceCloudRetentionStrategy.OfflineReason.BuildHasRun=VSphere Cloud Slave configured was to run one build only
fixedLifespanCloudRetentionStrategy.OfflineReason.LifespanReached=VSphere Cloud Agent has reached the end of its configured lifespan of {0} minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a linefeed to the end of the file.

@pjdarton
Copy link
Member

This PR has been inactive for some time now...
@rhencke are you still interested in completing this and getting it merged in?

I'm not keen on having PRs open for ages - I'd prefer that we either work on them until they're complete and get them merged in, or we close them if that's not going to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adds extra functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants