Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(manifests): fix full CRDs and update tests to use them. Fixes #8532 #14044

Merged
merged 13 commits into from
Jan 16, 2025

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Jan 2, 2025

Fixes #8532. Partial fix for #8190

Motivation

The full CRDs at manifests/base/crds/full have many errors that make them only usable in editors (#11266 (comment)). They used to be installable in a cluster, but now that's no longer possible:

$ kubectl apply --server-side -k manifests/base/crds/full/
customresourcedefinition.apiextensions.k8s.io/workflowartifactgctasks.argoproj.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/workflowtaskresults.argoproj.io serverside-applied
Error from server (Invalid): CustomResourceDefinition.apiextensions.k8s.io "clusterworkflowtemplates.argoproj.io" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[templateDefaults].properties[dag].properties[tasks].items.properties[inline].type: Required value: must not be empty for specified object fields, spec.validation.openAPIV3Schema.properties[spec].properties[templateDefaults].properties[steps].items.items: Required value: must be specified, spec.validation.openAPIV3Schema.properties[spec].properties[templates].items.properties[dag].properties[tasks].items.properties[inline].type: Required value: must not be empty for specified object fields, spec.validation.openAPIV3Schema.properties[spec].properties[templates].items.properties[steps].items.items: Required value: must be specified]
Error from server (Invalid): CustomResourceDefinition.apiextensions.k8s.io "cronworkflows.argoproj.io" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[workflowSpec].properties[templateDefaults].properties[dag].properties[tasks].items.properties[inline].type: Required value: must not be empty for specified object fields, spec.validation.openAPIV3Schema.properties[spec].properties[workflowSpec].properties[templateDefaults].properties[steps].items.items: Required value: must be specified, spec.validation.openAPIV3Schema.properties[spec].properties[workflowSpec].properties[templates].items.properties[dag].properties[tasks].items.properties[inline].type: Required value: must not be empty for specified object fields, spec.validation.openAPIV3Schema.properties[spec].properties[workflowSpec].properties[templates].items.properties[steps].items.items: Required value: must be specified]
...

Having working full CRDs would enable many useful features, such as kubectl explain output (#8190), validating admission policies (#14045), and integration with tools like cdk8s (#8532).

Modifications

This PR looks intimidating, but 99% of the changes are in manifests/base/crds/full, which is not used in any of the release manifests. I tried to avoid changing the release manifests or API as much as possible, so there shouldn't be any backwards-compatibility concerns.

As explained at #13754 (comment), there were two issues that prevented us from using the full CRDs everywhere:

  1. Kubernetes size limits when using client-side apply.
  2. Wrong validation information due to kubebuilder limitations.

For the first issue, I verified that this is no longer an issue when using server-side apply, so in #14048, I updated the make install target to use it. For the second issue, I went through and added workarounds to hack/manifests/crds.go for all the kubebuilder issues I could find. Since the E2E test suite now uses the full CRDs, it should tell us if there's anything I missed.

I didn't update the release manifests to use the full CRDs, since that's a big change and we probably should wait awhile to make sure there aren't any unexpected issues. In the meantime, users can easily opt into the full CRDs if they want.

Verification

Ran tests locally. Also, I verified the documentation changes using make docs-serve.

assert.Equal(t, "malformed", metadata.Name)
assert.Equal(t, wfv1.WorkflowFailed, status.Phase)
})
s.Given().Exec("kubectl", []string{"apply", "-f", "testdata/malformed/malformed-workflow.yaml"}, fixtures.ErrorOutput("unknown field \"spec.arguments.parameters.someParam\""))
Copy link
Contributor Author

@MasonM MasonM Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to substantially change the tests in this file because malformed workflows will now be rejected at the API level instead of at runtime. We could test the runtime behavior of malformed workflows by adding a new profile that uses the minimal CRDs and having test-executor use that, but that seems overkill.

The full CRDs at `manifests/base/crds/full` are only intended for usage
in editors
(argoproj#11266 (comment))
and are unusable otherwise. Trying to install them will cause spurious
validation errors with nearly every workflow.

Having the full CRDs would enable many useful features, such as `kubectl
explain` output (argoproj#8190),
[validating admission
policies](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
(argoproj#13503), and
integration with tools like cdk8s
(argoproj#8532).

As explained at
argoproj#13754 (comment),
there's two issues that prevent us from using the full CRDs everywhere:
1. [Kubernetes size
   limits](kubernetes/kubernetes#82292) when
   using client-side apply. This is not a problem when using
   .
2. Wrong validation information due to kubebuilder limitations.

For the first issue, I verified that this is no longer an issue when
using [server-side
apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/),
so I updated the `make install` target to use it. Since `kubectl apply --server-side`
isn't compatible with `kubectl apply --prune`, I had to switch to use
[apply
sets](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#alternative-kubectl-apply-f-directory-prune),
which is intended to replace `--prune`, and seems to work well.

For the second issue, I went through and added workarounds to
`hack/manifests/crds.go` for all the kubebuilder issues I could find.
Since the E2E test suite now uses the full CRDs, it should tell us if
there's anything I missed.

I didn't update the release manifests to use the full CRDs, since that's
a big change and we probably should wait awhile to make sure there
aren't any unexpected issues. Users can easily opt into the full
CRDs if they want.

Signed-off-by: Mason Malone <[email protected]>

The official release manifests come with stripped-down CRDs that omit validation information.
This is a workaround for [Kubernetes size limitations](https://github.com/kubernetes/kubernetes/issues/82292) when using client-side apply.
As of version 3.7, the full CRDs can be installed using [server-side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) via the following command:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't go out as part of 3.7, I'll remove this before it's merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@MasonM MasonM marked this pull request as ready for review January 2, 2025 01:53
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Jan 3, 2025
This was split off from
argoproj#14044 to make it easier
to review. This makes a few minor improvements to the build to support
the full CRD fixes:
1. Update devcontainer to install k3s 1.29.10, since that's what we use
   in CI: https://github.com/argoproj/argo-workflows/blob/ef41f83f801a6ff48c87f882a6b75d0e37529134/.github/workflows/ci-build.yaml#L263
   1.27 is EOL and doesn't support things like validation rules in CRDs.
2. Update `make install` to use [server-side
   apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/),
   which is meant to replace client-side apply and isn't affected by
   size limitations for the CRDs. Since `kubectl apply --server-side`
   isn't compatible with `kubectl apply --prune`, I had to switch to
   [apply
   sets](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#alternative-kubectl-apply-f-directory-prune),
   which is intended to replace allow list pruning, and seems to work
   just as wlel.
3. Minor refactoring of the manifests under `manifests/` to use
   [Kustomize
   Components](https://kubectl.docs.kubernetes.io/guides/config_management/components/)
   so that we can share code with the the manifests under
   `test/e2e/manifests` without duplication. See argoproj#14001 for more details
   on this approach.

Signed-off-by: Mason Malone <[email protected]>
@MasonM
Copy link
Contributor Author

MasonM commented Jan 3, 2025

I split off the minor build refactoring (e.g. switching to server-side apply) to a separate PR to make this one easier to review: #14048. EDIT: This has been merged

@Joibel Joibel self-assigned this Jan 6, 2025
@MasonM
Copy link
Contributor Author

MasonM commented Jan 10, 2025

/retest

Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just questions to see how you made some decisions here.

@@ -663,6 +666,7 @@ type Template struct {
HTTP *HTTP `json:"http,omitempty" protobuf:"bytes,42,opt,name=http"`

// Plugin is a plugin template
// +kubebuilder:pruning:PreserveUnknownFields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Contributor Author

@MasonM MasonM Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the structure of plugins is free-form. See #7393

Without this, the TestExecutorPluginsSuite/TestTemplateExecutor test fails:

Submitting workflow  template-executor-
W0111 01:58:53.394874   40529 warnings.go:70] unknown field "spec.templates[0].plugin.hello"
    executor_plugins_test.go:29: expected exactly one key, got 0
=== FAIL: ExecutorPluginsSuite/TestTemplateExecutor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments in 987bb2e explaining this

@@ -458,25 +456,16 @@ spec:

func (s *CronSuite) TestMalformedCronWorkflow() {
s.Given().
Exec("kubectl", []string{"apply", "-f", "testdata/malformed/malformed-cronworkflow.yaml"}, fixtures.NoError).
Exec("kubectl", []string{"apply", "-f", "testdata/malformed/malformed-cronworkflow.yaml"}, fixtures.ErrorOutput("unknown field \"spec.workflowSpec.arguments.parameters.someParam\"")).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels odd. Why did it apparently test a wellformed-cronworkflow..., who knows? Just as a delay for the Events to get emitted?

Anyway, can we drop the wellformed workflow from this test, or if it isn't tested elsewhere it should be moved to be in it's own test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, so I removed it in d3f2ee7

@@ -1502,6 +1506,7 @@ type WorkflowStep struct {
Template string `json:"template,omitempty" protobuf:"bytes,2,opt,name=template"`

// Inline is the template. Template must be empty if this is declared (and vice-versa).
// +kubebuilder:pruning:PreserveUnknownFields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, why do we allow unknown fields in a template, that feels very odd?

Copy link
Contributor Author

@MasonM MasonM Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a recursive type (since the inline template can potentially contain steps/dags that also define inline), which aren't allowed in CRDs, and causes kubebuilder to generate an empty object, e.g.

That empty object is the cause of the Required value: must not be empty for specified object fields errors you get when trying to import the full CRDs on the main branch (see the PR description).

I can't think of an easy fix for this. There's talk about this case in kubernetes/kubernetes#62872, but I doubt $ref support is going to be added anytime soon (if ever). I guess I could update hack/manifests/crds.go to extract the normal definition for Template, walk through it to remove any nested inline fields, then copy that to each of these empty objects, but that would significantly increase the size of the schema and be a pain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments in 987bb2e explaining this

}
// Workaround for missing generateName (see https://github.com/kubernetes-sigs/controller-tools/pull/640)
func patchMetadata(schema *obj, baseFields ...string) {
schema.SetNestedField(map[string]string{"type": "string"}, append(baseFields, "generateName")...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yuck. Nice fix.


The official release manifests come with stripped-down CRDs that omit validation information.
This is a workaround for [Kubernetes size limitations](https://github.com/kubernetes/kubernetes/issues/82292) when using client-side apply.
As of version 3.7, the full CRDs can be installed using [server-side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) via the following command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@MasonM
Copy link
Contributor Author

MasonM commented Jan 11, 2025

The failing test (SignalsSuite/TestStopBehavior) is clearly unrelated and also failing in the last build on the main branch: https://github.com/argoproj/argo-workflows/actions/runs/12707760387/job/35432247793

edit: test passed after a couple retries

@@ -2,7 +2,7 @@ apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

resources:
- ../../../../../manifests/base/crds/minimal
- ../../../../../manifests/base/crds/full
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This then won't test the minimal CRDs?

Copy link
Contributor Author

@MasonM MasonM Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. There aren't any tests for the release manifests either, and there isn't much to test with the minimal CRDs, besides maybe ensuring they validate against the schema. If you think that's worth testing, I could add a step to each E2E test job that runs kubectl apply --server-side --validate=strict --dry-run=server --kustomize manifests/base/crds/minimal/
(or kubectl apply --server-side --validate=strict --dry-run=server -f 'manifests/*.yaml' to have it validate every release manifest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terrytangyuan Would you like me to enter a PR that adds kubectl apply --server-side --validate=strict --dry-run=server -f 'manifests/*.yaml' to validate the release manfiests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please add that step. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! #14089

@MasonM MasonM requested a review from terrytangyuan January 14, 2025 01:24
@MasonM
Copy link
Contributor Author

MasonM commented Jan 14, 2025

@Joibel Do you have any other questions? Is there anything else you'd like to see in this PR?

MasonM added a commit to MasonM/argo-workflows that referenced this pull request Jan 16, 2025
As requested in
argoproj#14044 (comment),
this adds a simple step to each E2E suite that validates the release
manifests using server-side field validation. This is intended to catch
simple issues, e.g. typos in a field:
```
$ make manifests-validate
<SNIP>
deployment.apps/postgres serverside-applied (server dry run)
The CustomResourceDefinition "clusterworkflowtemplates.argoproj.io" is invalid:
* spec.validation.openAPIV3Schema.properties[apiVersion].type: Invalid value: "TYPO": must be string
* spec.validation.openAPIV3Schema.properties[apiVersion].type: Unsupported value: "TYPO": supported values: "array", "boolean", "integer", "number", "object", "string"
make: *** [Makefile:452: manifests-validate] Error 1
```

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Jan 16, 2025
As requested in
argoproj#14044 (comment),
this adds a simple step to each E2E suite that validates the release
manifests using server-side field validation. This is intended to catch
simple issues, e.g. typos in a field:
```
$ make manifests-validate
<SNIP>
deployment.apps/postgres serverside-applied (server dry run)
The CustomResourceDefinition "clusterworkflowtemplates.argoproj.io" is invalid:
* spec.validation.openAPIV3Schema.properties[apiVersion].type: Invalid value: "TYPO": must be string
* spec.validation.openAPIV3Schema.properties[apiVersion].type: Unsupported value: "TYPO": supported values: "array", "boolean", "integer", "number", "object", "string"
make: *** [Makefile:452: manifests-validate] Error 1
```

Signed-off-by: Mason Malone <[email protected]>
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@terrytangyuan terrytangyuan merged commit 8304dc7 into argoproj:main Jan 16, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full CRDs are incomplete and causes json2jsii to fail
3 participants