[WIP/POC] Degraded NodePool Status Condition #1880

rschalo · 2024-12-13T18:28:19Z

Fixes #N/A

Description

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2024-12-13T18:28:22Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2024-12-13T18:28:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rschalo
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2024-12-13T23:23:56Z

pkg/controllers/nodepool/degraded/controller.go

+func (c *Controller) Reconcile(ctx context.Context, nodePool *v1.NodePool) (reconcile.Result, error) {
+	ctx = injection.WithControllerName(ctx, "nodepool.degraded")
+	stored := nodePool.DeepCopy()
+	if nodePool.Status.FailedLaunches >= 3 {


Consider combining these two statements so we do a check for 0 and do a check for greater than or equal to 3 together and then set the status condition and patch in the same call

jonathan-innis · 2024-12-13T23:27:08Z

pkg/controllers/nodeclaim/lifecycle/liveness.go

 	// If the Registered statusCondition hasn't gone True during the TTL since we first updated it, we should terminate the NodeClaim
 	// NOTE: ttl has to be stored and checked in the same place since l.clock can advance after the check causing a race
-	if ttl := registrationTTL - l.clock.Since(registered.LastTransitionTime.Time); ttl > 0 {
+	// If the nodepool is degraded, requeue for the remaining TTL.
+	if ttl := registrationTTL - l.clock.Since(registered.LastTransitionTime.Time); ttl > 0 || nodePool.StatusConditions().Get(v1.ConditionTypeDegraded).IsTrue() {


I'm not following what this check is intended to do. Is this supposed to make it so that we dynamically scale the requeue time?

jonathan-innis · 2024-12-13T23:28:06Z

pkg/controllers/nodeclaim/lifecycle/liveness.go

 		return reconcile.Result{RequeueAfter: ttl}, nil
 	}
 	// Delete the NodeClaim if we believe the NodeClaim won't register since we haven't seen the node
+	// Here we delete the nodeclaim if the node failed to register, we want to retry against the nodeClaim's nodeClass/nodePool 3x.
+	// store against a nodepool since nodeclass is not available? nodeclass ref on nodepool, nodepool is 1:1 with nodeclass anyway
+	log.FromContext(ctx).V(1).WithValues("failures", nodePool.Status.FailedLaunches).Info("failed launches so far")


This might be here for debugging, but I don't think that we should merge this since it's going to be incredibly noisy

Definitely just in for debugging and agree it should be removed.

jonathan-innis · 2024-12-13T23:28:26Z

pkg/controllers/nodeclaim/lifecycle/liveness.go

+	log.FromContext(ctx).V(1).WithValues("failures", nodePool.Status.FailedLaunches).Info("failed launches so far")
+	nodePool.Status.FailedLaunches += 1
+	log.FromContext(ctx).V(1).WithValues("failures", nodePool.Status.FailedLaunches).Info("failed launches so far")
+	if err := l.kubeClient.Status().Update(ctx, nodePool); err != nil {


Why do we choose to do an Update in some places and do a patch with optimistic locking in others?

I'll make these consistent, was just working around an issue I had where Patch didn't always work and I wasn't yet sure why.

jonathan-innis · 2024-12-13T23:29:10Z

pkg/apis/v1/nodepool_status.go

@@ -27,13 +27,18 @@ const (
 	ConditionTypeValidationSucceeded = "ValidationSucceeded"
 	// ConditionTypeNodeClassReady = "NodeClassReady" condition indicates that underlying nodeClass was resolved and is reporting as Ready
 	ConditionTypeNodeClassReady = "NodeClassReady"
+	// TODO
+	ConditionTypeDegraded = "Degraded"


Do you have thoughts around how you are going to track the last failed launch and then extend the amount of time before we retry out?

For this, I was thinking we'd use the lastTransitionTime for when Degraded == true.

jonathan-innis · 2024-12-13T23:29:53Z

kwok/charts/crds/karpenter.sh_nodepools.yaml

@@ -498,6 +498,9 @@ spec:
                      - type


Should we propose this change as an RFC? There seems to be a bunch of detail around what this status condition means, how we are going to track failures, what it means with respect to scheduling, etc. and I think it would be good to let people see this and get cloudprovder input as well

I'm writing the RFC today and I should be able to post tomorrow.

jonathan-innis · 2024-12-13T23:30:49Z

pkg/controllers/nodepool/degraded/controller.go

+	cloudProvider cloudprovider.CloudProvider
+}
+
+func NewController(kubeClient client.Client, cloudProvider cloudprovider.CloudProvider) *Controller {


Have you thought about how this status condition affects the schedulability of the NodePool? Does it deprioritize the NodePool?

I think we don't want to skip over Degraded NodePools but they should be deprioritized. The simplest way is to do this is probably treat the weight as 0 in scheduling if a NodePool is Degraded.

I think the downside of this is that if a general or fast and cheap NodePool is degraded and a fallback NodePool exists which satisfies the pending pod but is more constrained and expensive then cluster costs could unexpectedly increase.

feat: add degraded status condition

26a316d

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 13, 2024

k8s-ci-robot requested review from engedaam and jackfrancis December 13, 2024 18:28

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 13, 2024

jonathan-innis reviewed Dec 13, 2024

View reviewed changes

rschalo added 2 commits December 16, 2024 20:48

respond to pr comments

0f64abd

most of the ring buffer approach

dadbb2e

rschalo changed the title ~~[WIP] feat: add degraded status condition~~ [WIP/POC] Degraded NodePool Status Condition Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP/POC] Degraded NodePool Status Condition #1880

[WIP/POC] Degraded NodePool Status Condition #1880

rschalo commented Dec 13, 2024

k8s-ci-robot commented Dec 13, 2024

k8s-ci-robot commented Dec 13, 2024

jonathan-innis Dec 13, 2024

jonathan-innis Dec 13, 2024

jonathan-innis Dec 13, 2024

rschalo Dec 16, 2024

jonathan-innis Dec 13, 2024

rschalo Dec 16, 2024

jonathan-innis Dec 13, 2024

rschalo Dec 16, 2024

jonathan-innis Dec 13, 2024

rschalo Dec 16, 2024

jonathan-innis Dec 13, 2024

rschalo Dec 16, 2024

[WIP/POC] Degraded NodePool Status Condition #1880

Are you sure you want to change the base?

[WIP/POC] Degraded NodePool Status Condition #1880

Conversation

rschalo commented Dec 13, 2024

k8s-ci-robot commented Dec 13, 2024

k8s-ci-robot commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment