Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

[local-volume] New controller to handle node deletion #817

Closed
msau42 opened this issue Jun 18, 2018 · 5 comments
Closed

[local-volume] New controller to handle node deletion #817

msau42 opened this issue Jun 18, 2018 · 5 comments

Comments

@msau42
Copy link
Contributor

msau42 commented Jun 18, 2018

Extension to kubernetes/community#1484

In cloud environments, nodes can be deleted and recreated fairly often. When nodes are deleted, the local disks are also deleted along with them, however local PVs remain and pods will get stuck scheduling because they are bound to a node that no longer exists.

For workloads that tolerate data loss and can recover with a brand new disk, the user can delete and recreate the PVC, which will cause the new PVC to be bound to a new disk. If using StatefulSets, the StatefulSet controller will automatically recreate a PVC if it doesn't exist.

The process of detecting node deletion and deleting the PVC could be automated by a controller. There's a few things to consider:

  • Workload needs to opt in to this controller. Not all workloads want this behavior.
  • Node deletion detection can be tricky in some environments. I know in GCE, the managed instance group recreates nodes with the same name. And in K8s 1.11, I think the Node object is no longer recreated by kubelet if the instance ID changes.
  • There are two scenarios: 1) PVC is already bound to a local PV. When local PV is released, there may be additional cleanup needed too since the daemonset provisioner no longer runs on that node. 2) Local PV Is unbound and available (but not actually since node is gone)

As for implementation ideas, I think metacontroller would be a cool framework to try out for this.

@msau42
Copy link
Contributor Author

msau42 commented Jun 18, 2018

/area local-volume

@msau42
Copy link
Contributor Author

msau42 commented Jun 18, 2018

Thinking about it a little more, the scenario where node is recreated with the same name could be handled by kubernetes/community#1484, which can detect that the path/disk no longer exists on the node.

So maybe there are actually 3 controllers involved here:

  1. Daemonset that monitors disks on each node: PV monitoring proposal kubernetes/community#1484
  2. Single controller that monitors deletion of node objects
  3. Single controller that manages workloads using local PVs

@NickrenREN
Copy link
Contributor

1 and 2 are being implemented here in https://github.com/caicloud/kube-storage-monitor

@msau42
Copy link
Contributor Author

msau42 commented Dec 19, 2018

Migrating to new repo: kubernetes-sigs/sig-storage-local-static-provisioner#10
/close

@k8s-ci-robot
Copy link
Contributor

@msau42: Closing this issue.

In response to this:

Migrating to new repo: kubernetes-sigs/sig-storage-local-static-provisioner#10
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants