-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support jitter in client reconnect logic #163
base: master
Are you sure you want to change the base?
Support jitter in client reconnect logic #163
Conversation
Update to latest upstream swarm-client
@basil can you help me get the right people to review this PR? Sorry to @ mention you directly, but there are no OWNERS or CONTRIB for this repo and I'm not sure how to get help. If there is a better way, please let me know and I'll be happy to comply. |
Hey Kirk, thanks for your contribution! I'd love to have this be part of the next release. It's unfortunately a bit of a busy time for me, but I will try to review this sometime next week. |
Bump on this, would be really nice to avoid the thundering herd that not staggering these causes. |
Hi @KirkMartinez, sorry for the delay in reviewing this. I just took a look now, and while overall the change looks fine, I wanted to suggest a possible alternative. The current change adds a boolean flag to enable jitter. When set, we use a reconnect interval between zero and May I suggest making the jitter an integer rather than a boolean, and adding it to the value of What do you think? |
I'm a heavy user of the swarm plugin and would love to see this feature merged. thanks :) |
- addressing feedback in jenkinsci#163
We have about 450 clients which all disconnected simultaneously when our Jenkins master went down.
When we brought it back up, most workers failed to connect due to the thundering herd. We could see from logs that most workers were trying to reconnect at the same time. This feature should alleviate that problem. It adds a new cmdline switch for the swarm client -useJitter (default: off) which uses a retryInterval randomly selected between zero and the retryInterval (as opposed to just using the retryInterval directly). The change still respects the absolute maximum (set by maxRetryInterval) and the selected backoff strategy. Happy to modify this if you have any concerns.