Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] refactor: split termination controller #1837

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jmdeal
Copy link
Member

@jmdeal jmdeal commented Nov 24, 2024

Fixes #N/A

Description
This PR splits the node termination controller into three. Each controller has an associated finalizer, with the final controller (Instance Termination) being unable to reconciile until the drain and volume finalizers have been removed.

Controller Finalizer
Drain karpenter.sh/drain-protection
Volume Detachment karpenter.sh/volume-protection
Instance Termination karpenter.sh/termination

This change was motivated by the increased complexity of the termination controller once additional status conditions were added for drain and volume detachment monitoring. Alternatively, the subreconciler pattern (a la the NodeClaim lifecycle controller) could have been used. However, this approach has some additional observability benefits thanks to the per-controller metrics, and is subjectively easier to test and maintain.

How was this change tested?
make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmdeal
Once this PR has been reviewed and has the lgtm label, please assign ellistarn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 24, 2024
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 24, 2024
@jmdeal jmdeal force-pushed the feat/volume-attachment-observability branch from f7e2b80 to 587d150 Compare November 24, 2024 16:04
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2024
@jmdeal jmdeal force-pushed the feat/volume-attachment-observability branch from 587d150 to c55f95c Compare November 24, 2024 22:35
@coveralls
Copy link

coveralls commented Nov 24, 2024

Pull Request Test Coverage Report for Build 12252575600

Details

  • 459 of 665 (69.02%) changed or added relevant lines in 19 files are covered.
  • 17 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.6%) to 80.12%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 4 0.0%
pkg/utils/node/node.go 9 15 60.0%
pkg/controllers/nodeclaim/hydration/controller.go 17 27 62.96%
pkg/utils/nodeclaim/nodeclaim.go 15 25 60.0%
pkg/utils/nodeclaim/types.go 18 28 64.29%
pkg/controllers/node/hydration/controller.go 24 35 68.57%
pkg/utils/node/types.go 27 42 64.29%
pkg/controllers/node/termination/reconcile/reconcile.go 55 71 77.46%
pkg/controllers/node/termination/instancetermination/controller.go 47 76 61.84%
pkg/controllers/node/termination/volumedetachment/controller.go 62 104 59.62%
Files with Coverage Reduction New Missed Lines %
pkg/test/cachesyncingclient.go 2 82.29%
pkg/controllers/disruption/consolidation.go 4 88.55%
pkg/utils/termination/termination.go 4 87.18%
pkg/controllers/provisioning/scheduling/preferences.go 7 86.52%
Totals Coverage Status
Change from base Build 12250796523: -0.6%
Covered Lines: 9052
Relevant Lines: 11298

💛 - Coveralls

@jmdeal jmdeal force-pushed the feat/volume-attachment-observability branch from c705878 to 52a0907 Compare November 25, 2024 01:51
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 4, 2024
@jmdeal jmdeal force-pushed the feat/volume-attachment-observability branch from 5392f91 to 6c8bccd Compare December 4, 2024 21:48
@jmdeal jmdeal force-pushed the feat/volume-attachment-observability branch from 6c8bccd to 0fdc2a2 Compare December 10, 2024 08:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2024
@jmdeal jmdeal changed the title feat: drain and volume attachment observability refactor: split termination controller Dec 10, 2024
@jmdeal
Copy link
Member Author

jmdeal commented Dec 11, 2024

/hold

I'm separating out the observability changes in this PR so I can get the essential feature change in and prioritize other work. I'll come back to this refactor, the current sticking point blocking this is the rollback story when adding additional finalizers.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2024
@jmdeal jmdeal changed the title refactor: split termination controller [WIP] refactor: split termination controller Dec 11, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2024
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants