-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-48641: config/v1/types_cluster_version: Explain image and version both set #2158
base: master
Are you sure you want to change the base?
OCPBUGS-48641: config/v1/types_cluster_version: Explain image and version both set #2158
Conversation
Hello @wking! Some important instructions when contributing to openshift/api: |
@wking: This pull request references Jira Issue OCPBUGS-48641, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@@ -62,7 +62,7 @@ type ClusterVersionSpec struct { | |||
// | |||
// Some of the fields are inter-related with restrictions and meanings described here. | |||
// 1. image is specified, version is specified, architecture is specified. API validation error. | |||
// 2. image is specified, version is specified, architecture is not specified. You should not do this. version is silently ignored and image is used. | |||
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the changes below about Version
, is it more complete this way?
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version. | |
// 2. image is specified, version is specified, architecture is not specified. image is used if the version metadata in the referenced image matches the specified version. API validation error otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifted from Slack:
"""
I think "API validation error" in the Godocs is trying to describe the CEL kubebuilder:validation:XValidation:rule here, because those are enforced by the Kube API server's validation, and clients cannot push invalid combinations. For version, there's no CEL enforcement, it's just up to the CVO to decide how to handle the "image and version both set" situation, and report its thoughts in status.conditions
"""
:TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What validations are made? Is it possible that they could be moved to CEL? Or does it need to introspect the image to validate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to introspect the image, checking the version in the release image's release-metadata
file against the spec
version
string that the cluster admin said they expected. See OCPBUGS-48641's:
Verifying payload failed version="4.17.99" image="quay.io/openshift-release-dev/ocp-release@sha256:82aa2a914d4cd964deda28b99049abbd1415f96c0929667b0499dd968864a8dd" failure=release image version 4.17.13 does not match the expected upstream version 4.17.99
error message for an example, where the CVO looks inside the sha256:82aa2a9...
release image, sees that the release-metadata
file claims that release image is 4.17.13, and then complains that the release image's 4.17.13 diverges from the spec
version
's 4.17.99 expectation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, and is this validation via a webhook or part of the controller? The API validation error otherwise
message seems odd here, especially if this is a controller based validation
@@ -702,16 +702,16 @@ type Update struct { | |||
Architecture ClusterVersionArchitecture `json:"architecture"` | |||
|
|||
// version is a semantic version identifying the update version. | |||
// version is ignored if image is specified and required if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poking a bit more deeply into where the nominal "ignored" came from, on 2022-11-08 I claimed it was ignored. I'm not sure what 2022-me was thinking there; possibly I was just focused on how the CVO looks up which image to use (and that logic doesn't run when image
is explicitly set in spec
), and I overlooked the sync-worker validation as it judges the requested desiredUpdate
for ReleaseAccepted
?
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hongkailiu, wking The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
965895d
to
387fac3
Compare
New changes are detected. LGTM label has been removed. |
Catching up with openshift/cluster-version-operator@9be6175c5f (pkg/cvo/sync_worker: Make expected/actual version mismatch fatal, 2020-08-09, openshift/cluster-version-operator#431), which uses the 'version' property as a sanity check for "is this pullspec the version I'm expecting?". This protects users from compromised or man-in-the-middled upstream update services who attempt downgrade and similar attacks by misrepresenting a recommended update. The text I'm adjusting landed in 354e2fb (config/v1/types_cluster_version: Add Architecture to DesiredUpdate, 2022-12-07, openshift#1339), but version-ignoring was never implemented, so nobody can be relying on that nominal behavior. And as the man-in-the-middle use case demonstrates, version-ignoring would be less safe than the version-match-enforcing behavior that the cluster-version operator has used since 2020. I edited types_cluster_version.go by hand, and then updated the other files with: $ hack/update-codegen-crds.sh $ hack/update-openapi.sh $ hack/update-swagger-docs.sh
387fac3
to
435a43a
Compare
@wking: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/cc |
@@ -62,7 +62,7 @@ type ClusterVersionSpec struct { | |||
// | |||
// Some of the fields are inter-related with restrictions and meanings described here. | |||
// 1. image is specified, version is specified, architecture is specified. API validation error. | |||
// 2. image is specified, version is specified, architecture is not specified. You should not do this. version is silently ignored and image is used. | |||
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What validations are made? Is it possible that they could be moved to CEL? Or does it need to introspect the image to validate?
@@ -702,16 +702,16 @@ type Update struct { | |||
Architecture ClusterVersionArchitecture `json:"architecture"` | |||
|
|||
// version is a semantic version identifying the update version. | |||
// version is ignored if image is specified and required if | |||
// architecture is specified. | |||
// version is required if architecture is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a CEL rule to validate this. We can test that it ratchets so that existing broken resources do not suddenly become broken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"version is ... required if architecture is specified" dates back to #1339, and seems orthogonal to the change I'm suggesting here. And actually, oc adm upgrade --to-multi-arch
is setting both architecture
and version
, and I don't see a reason to block that; it's the same sanity-check of "yes, the image the cluster retrieved seems like the release the cluster admin was expecting" for folks where version numbers are more recognizable than image digests (everybody? Definitely me, anyway). Should I drop that unnecessary constraint from the docs in this pull request, or can I file a follow-up pull request dropping that constraint once this one merges? Or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's orthogonal, but it's good to make compatible incremental change as we are touching areas of APIs.
What happens if version is missing when architecture is specified today? Ignoring CLI tooling that would set it, since folks can manipulate these resources themselves, would it cause CVO to return errors when it processes the object?
If so, adding a CEL rule as below would give more immediate feedback to a user, and is relatively free to us to implement. As of 4.18 this should ratchet itself, but we would need to test it.
// +kubebuilder:validation:XValidation:rule="!has(self.architecture) || has(self.version)",message="version if required when architecture is set"
A self ratcheting version
// +kubebuilder:validation:XValidation:rule="!has(self.architecture) || has(self.version) || has(oldSelf.architecture)",message="version if required when architecture is set"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if version is missing when architecture is specified today?
Looks like that's already guarded here. With a launch 4.17.12 aws
Cluster Bot cluster:
$ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/desiredUpdate", "value": {"architecture": "Multi"}}]'
The ClusterVersion "version" is invalid: spec.desiredUpdate: Invalid value: "object": no such key: version evaluating rule: Version must be set if Architecture is set
So I can leave the version is required if architecture is specified
docs in place here, and don't need to add additional CEL.
// When image is set, architecture cannot be specified. | ||
// If both version and image are set, the version metadata in the referenced image must match the specified version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does version metadata actually mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ oc adm release info -o json quay.io/openshift-release-dev/ocp-release:4.17.12-x86_64 | jq -r .metadata.version
4.17.12
Docs discussing the release-metadata
file that holds that as part of the release image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an admin were unsure, would they be able to check this themselves from the docs on the API? Perhaps linking out to this doc would be a useful help for users of this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much precedent to linking out to the enhancements repo for more details:
api$ git --no-pager grep github.com/openshift/enhancements/blob/
README.md:conventions](https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#api),
machine/v1beta1/types_machine.go: // https://github.com/openshift/enhancements/blob/master/enhancements/machine-instance-lifecycle.md
How about inlining something more here to make it clear that it's metadata extracted from the release image? Maybe "the version metadata extracted from the referenced image" would be sufficient? Or "the version extracted from the referenced image"? Or "the version string..."? Or...?
Catching up with openshift/cluster-version-operator@9be6175c5f (openshift/cluster-version-operator#431), which uses the
version
property as a sanity check for "is this pullspec the version I'm expecting?". This protects users from compromised or man-in-the-middled upstream update services who attempt downgrade and similar attacks by misrepresenting a recommended update.The text I'm adjusting landed in 354e2fb (#1339), but version-ignoring was never implemented, so nobody can be relying on that nominal behavior. And as the man-in-the-middle use case demonstrates, version-ignoring would be less safe than the version-match-enforcing behavior that the cluster-version operator has used since 2020.