-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change the kubelet service crash loop behavior #2178
Comments
the kubeadm change is doable for 1.19, but arguably the k/release change needs wider agreement. |
cc @rosti which means that the PR might be much better than the above proposal. |
I thought kubeadm already had code related to doing this https://github.com/kubernetes/kubernetes/blob/875f31e98878fd199a76fd0ba2465d14558788cd/cmd/kubeadm/app/phases/kubelet/kubelet.go#L38 |
it only starts and restarts the service but does not manage "enable" status. |
The necessary modifications here are:
As said previously. It doesn't hurt for us to implement this. It's not that big of a change on our side. The problem is what to do with the packages where the kubelet service continues to be automatically enabled and crash looping (since it doesn't have a config file). |
your proposed plan for kubeadm seems fine to me.
so i think ideally we should be making this change only for the latest MINOR. currently the kubeadm package has the kubelet package as a dependency (there were discussions to change this too), which supposedly installs the same versions for both package for most users. there could be users that are bypassing that and installing e.g. kubeadm 1.x and kubelet 1.x-1, due to factor X, and this is a supported skew. for such a skew the old kubelet service may be enabled by default (crashloop) but the new kubeadm could be managing "enable" already. testing something like so overall even if a kubeadm binary encounters an older kubelet service, for the crashloop problem in particular this should be fine, unless i'm missing something. however for the kubelet flag removal and instance specific CC problem, the kubeadm 1.x and kubelet 1.x-1 skew might bite people that are passing flags to a kublet that no longer supports flags at all. |
Tested I think @rosti 's plan is fine, and I'd like to help with this. |
/assign |
note, if 1.20 is a "stabilization release" we should not be making this change. |
/sig release cluster-lifecycle |
the part one may not need the KEP
the part two may need the KEP
|
yet, we can avoid doing part one, if part two never happens, thus documenting the proposed change in one place feels right. |
@neolit123 yes, you are right. so, where should we file the kep? |
the KEP process: it should be either here (A): my preference is for A, but let's hold until we decide if we are making this change in 1.20:
|
quick note that we cannot do that while the kubelet config is not respected in join. +1 for #2178 (comment) this will also have the benefit of allowing CI tooling to look for panics in the logs without creating loads of noise. (something that would be pointless for us to add today, due to kubeadm) |
Let me know if I can help. I would like to eliminate the crashlooping in CI, and I think this will avoid a lot of confused users. |
@BenTheElder Since we will not change the kubernetes/release repository to have the kubelet service disabled by default for now. I think we can stop kubelet, when we do not have the kubelet config. And start the kubelet, until we get the kubelet config? |
@xlgao-zju looks like 1.20 is a regular release so we can proceed with the KEP if you have the time:
let me know if you have questions. also feedback from the release-eng team will be required. |
so FWIW this is really easy to fix and works great so far, the issue is the Kubernetes packaging and systemd spec sources are not ideal (there are various other issues referencing this) so it's not possible to roll this out only to future releases, and it's technically a breaking change. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/lifecycle frozen |
now that we have new packages this change is easier to do.
this can be done for 1.31:
|
over time we have seen a number of complains related to the crash loop of the kubelet service in the DEB/RPMs. when the kubelet is installed, the service is enabled but it would fail because it's missing its config.yaml (KubeletConfiguration), unless something like kubeadm creates one for it.
this has caused problems for:
after a discussion of the kubeadm office hours of 10.06.2020 we've agreed that it might be a good idea to change this behavior and keep the service disabled by default. but this would require changes in both kubeadm and the kubelet systemd specs.
the idea we are thinking of is the following:
note that, currently kubeadm just has a preflight check that fails if the service is not enabled and instructs the user how to enable it manually.
/kind feature
/priority important-longterm
The text was updated successfully, but these errors were encountered: