Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

rhencke · 2018-09-20T13:42:20Z

We have found a need for an alternate cloud retention strategy for our builds.

The keep-until-idle strategy is very good, but the issue we hit is that our builds
are continuously running so agent never go idle. Eventually, the agents will hit
some issue we must address, such as running out of disk space. While we could
work to fix the agents, it's much easier to just delete them and let them be recreated.

So, we have introduced a fixed-life strategy to the vSphere plugin.
The fixed-lifespan strategy provisions an agent for a fixed amount of time.
After the amount of time has expired, the agent is disconnected, any builds
finished, then the agent is removed. This prevents incurring the costs of the
build-once strategy, which involves bootup/provisioning/no cache costs, but
ensures the agents are recycled on a regular basis which helps prevent issues.

While, currently, we do this as a separate retention strategy, I could also see this
simply being an option that could be added to the existing keep-until-idle strategy.

I was hoping to get your thoughts, and if interested, submit this for inclusion.

Thank you.

The fixed-lifespan strategy provisions an agent for a fixed amount of time. After the amount of time has expired, the agent is disconnected, any builds finished, then the agent is removed.

pjdarton

Overall an interesting idea.
I've made a few coding-style comments but my main issue is that I think it would be better still if a user didn't have to choose between this retention strategy and the until-idle one but could, instead, have the best of both worlds and have "until idle for over X minutes, or until Y minutes has elapsed" instead.
e.g. where I work, I'd want to retain VMs until they've been idle for an hour or until they were over two days old.

i.e. it might be better to add this "maximum lifespan" functionality to the existing until-idle retention strategy.

pjdarton · 2018-11-05T11:19:51Z

src/main/java/org/jenkinsci/plugins/vsphere/FixedLifespanCloudRetentionStrategy.java

+            try {
+                c.disconnect(cause).get();
+            } catch (InterruptedException | ExecutionException e) {
+                LOGGER.log(WARNING, "Failed to disconnect " + cname, e);


If this fails to disconnect, it looks like setAtEndOfLife() will still be set so that no further attempts will be made to disconnect the slave (until Jenkins gets restarted).
That doesn't seem right to me.

pjdarton · 2018-11-05T11:21:52Z

src/main/java/org/jenkinsci/plugins/vsphere/FixedLifespanCloudRetentionStrategy.java

+                }
+            }
+        }
+        return 1; // re-check in 1 minute


Once a machine has reached end-of-life then re-checking every minute is good.
However, before it's reached end-of-life, we don't need to check every minute - we should return "number of minutes until we reach end of line".

pjdarton · 2018-11-05T11:23:13Z

src/main/java/org/jenkinsci/plugins/vsphere/FixedLifespanCloudRetentionStrategy.java

+        return !isAtEndOfLife();
+    }
+
+    private transient boolean atEndOfLife;


I'm not a big fan of having class fields scattered throughout the code - I much prefer to see all the fields at the top.

pjdarton · 2018-11-05T11:31:44Z

src/main/java/org/jenkinsci/plugins/vsphere/FixedLifespanCloudRetentionStrategy.java

+
+    private transient boolean atEndOfLife;
+
+    private synchronized boolean isAtEndOfLife() {


I don't think that synchronized here is going to achieve thread-safety.
e.g. the check method calls isAtEndOfLife() twice, so if you only lock in here, check might get two different values when it calls the method if another thread has called setAtEndOfLife() in between.
You probably want to make check() be synchronized (or add a synchronized section within it) so that you can ensure that you get a consistent and controlled logic flow within that method.
...also, if you put it there, you can reduce the number of synchronized blocks to just one - thread synchronization is expensive so we don't want to do that any more than we absolutely have to.
...and I'd suggest that you check with other retention strategies (in this and other plugins) and verify that it's needed at all because, if we don't need to make this code thread-safe, we shouldn't add code making it look like it is.

pjdarton · 2018-11-05T11:32:15Z

...ain/resources/org/jenkinsci/plugins/vsphere/FixedLifespanCloudRetentionStrategy/config.jelly

+    <f:entry title="${%Lifespan Timeout}" field="lifespanMinutes">
+        <f:number default="60"/>
+    </f:entry>
+</j:jelly>


Please add a linefeed to the end of the file.

pjdarton · 2018-11-05T11:32:24Z

src/main/resources/org/jenkinsci/plugins/vsphere/Messages.properties

@@ -1 +1,2 @@
 runOnceCloudRetentionStrategy.OfflineReason.BuildHasRun=VSphere Cloud Slave configured was to run one build only
+fixedLifespanCloudRetentionStrategy.OfflineReason.LifespanReached=VSphere Cloud Agent has reached the end of its configured lifespan of {0} minutes


Please add a linefeed to the end of the file.

pjdarton · 2020-03-18T17:01:36Z

This PR has been inactive for some time now...
@rhencke are you still interested in completing this and getting it merged in?

I'm not keen on having PRs open for ages - I'd prefer that we either work on them until they're complete and get them merged in, or we close them if that's not going to happen.

Add new fixed-lifespan cloud retention strategy.

b38b052

The fixed-lifespan strategy provisions an agent for a fixed amount of time. After the amount of time has expired, the agent is disconnected, any builds finished, then the agent is removed.

rhencke force-pushed the fixedTime branch from 80ba244 to b38b052 Compare September 20, 2018 13:44

pjdarton added the enhancement Adds extra functionality label Oct 4, 2018

pjdarton requested changes Nov 5, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

rhencke commented Sep 20, 2018

pjdarton left a comment

pjdarton Nov 5, 2018

pjdarton Nov 5, 2018

pjdarton Nov 5, 2018

pjdarton Nov 5, 2018

pjdarton Nov 5, 2018

pjdarton Nov 5, 2018

pjdarton commented Mar 18, 2020


		private transient boolean atEndOfLife;

		private synchronized boolean isAtEndOfLife() {

		@@ -1 +1,2 @@
		runOnceCloudRetentionStrategy.OfflineReason.BuildHasRun=VSphere Cloud Slave configured was to run one build only
		fixedLifespanCloudRetentionStrategy.OfflineReason.LifespanReached=VSphere Cloud Agent has reached the end of its configured lifespan of {0} minutes

Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

Are you sure you want to change the base?

Request for feedback: Add new fixed-lifespan cloud retention strategy. #96

Conversation

rhencke commented Sep 20, 2018

pjdarton left a comment

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton Nov 5, 2018

Choose a reason for hiding this comment

pjdarton commented Mar 18, 2020