Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement for RHCOS in the marketplace #1655

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

patrickdillon
Copy link
Contributor

Rough draft of an enhancement/design for automating publishing of cloud marketplace images and utilizing them in the installer.

It may make sense to break this into smaller pieces, particularly publishing non-paid Azure marketplace images, for prioritization.

This is very rough at the moment, but perfect is the enemy of good, and getting some feedback, iteration, and centralization may help.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2024
Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from patrickdillon. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

`.architectures[("aarch64","x86_64")].rhel_coreos_extensions` fields.


### Workflow Description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describe the "older" workflow, not the one I suggested in https://miro.com/app/board/uXjVK0sf6No=/, which is more linear and does not need us to do multiple things in a single PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this back up. I didn't understand the Alternate Workflow, but after reviewing it a bit more, I understand a bit better:

Marketplace images are contained in a separate rhcos-marketplace.json file. Do you have a sense of how that file is generated? By RHCOS team or the installer team?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generate the rhcos.json file with our tooling. Ideally the new rhcos-marketplace.json file is generated by the team responsible for uploading the images, using the same format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travier I investigated the idea of using multiple files, but landed on the current proposal as a simpler way to solve the same problem. I addressed multiple files in the Alternatives section. Would you please take a fresh look?

The idea is really quite simple, but I hope I have not made it too confusing when trying to explain.

images and including them in the RHCOS stream.
* Setup non-paid Azure marketplace offering: determine publisher
* Definition in RHCOS stream to separate multiple similar offers (e.g. ROSA-specific marketplace images vs.
OpenShift AWS marketplace images, Azure paid vs. free images)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call out publishing of x86 and ARM64 images? Or put that as non-goals? Generally, we (ARO HCP) would like to have both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it. they would be requirements for self-managed (as we support both arches).

Copy link
Member

@enxebre enxebre Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this give me the ability to infer an azure imageID or marketplace Architecture/Offer/Publisher/Sku out of a release payload rhcos.json/ocp version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this give me the ability to infer an azure imageID or marketplace Architecture/Offer/Publisher/Sku out of a release payload rhcos.json/ocp version?

Yes, you can currently extract the coreos stream from a release image, for example you can get the 4.16.6 stream like this:

$ oc image extract --file /coreos/coreos-stream.json $(oc adm release info --image-for machine-os-images quay.io/openshift-release-dev/ocp-release:4.16.6-x86_64)

Currently that only includes the source vhd, but if we are able to include marketplace images, they could be queried from that file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Jira ticket(s) tracking the work needed for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM publishing is already handled (at the top level of rhcos.json).

Do we have Jira ticket(s) tracking the work needed for this?

If RHCOS team thinks this is a viable idea, I can create cards for the proposal here.


ROSA and ARO both make use of marketplace images. ARO HCP depend on the
availability of Azure marketplace images. It is an Open Question as to whether
ARO requires a set of Azure marketplace images that is distinct from typical
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARO doesn't rely on any changes to the underlying RHCOS images that are used by upstream OCP. We take the existing VHDs in the rhcos.json file and publish them ourselves manually today.

The one difference between marketplace offerings though is the ARO one is that the ARO one is a Microsoft published marketplace offering, which I believe has subtle differences between a third-party published offering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So we need two separate sets of azure marketplace images? I assume standalone openshift cannot use the marketplace images from the 1P Microsoft Marketplace, as they are billed specifically for ARO? If not for other reasons...

Does ARO need support with publishing the images? ARO is publishing the images now, but would like that to delegate that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose free images are not something we can implement in the Azure Marketplace without offer acceptance, etc. In that case, additional automation might need to be implemented to add the account to install an ARO cluster to an Azure Marketplace listing that contains the install images.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARO images are free today. We use a different mechanism to bill customers.

Does ARO need support with publishing the images? ARO is publishing the images now, but would like that to delegate that?

Yes, that's the goal with our recent conversations with all the stakeholders in SPSTRAT-271 and related cards. There are some nuances with the 1p offering which we can talk about offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I am joining late to the party here. So, is the publishing of ARO images (which is currently handled by the ARO team), out of scope for the work this enhancement is trying to propose?

Will all non-ARO Azure Marketplace images all be of the type NoPurchasePlan? If not, will there be instances where a non-ARO Azure OpenShift Install would want to use PurchasePlan* images?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all non-ARO Azure Marketplace images all be of the type NoPurchasePlan? If not, will there be instances where a non-ARO Azure OpenShift Install would want to use PurchasePlan* images?

These questions were answered later in the doc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an update, we were able to successfully use the release engineering's tooling to upload an RHCOS image to the 1p Azure Marketplace.

enhancements/rhcos/rhcos-cloud-marketplace-images.md Outdated Show resolved Hide resolved
enhancements/rhcos/rhcos-cloud-marketplace-images.md Outdated Show resolved Hide resolved

**pull request** is a pull request, opened by an RHCOS engineer,
bumping the RHCOS stream in openshift/installer. The bump
is based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing part of the sentence?

enhancements/rhcos/rhcos-cloud-marketplace-images.md Outdated Show resolved Hide resolved
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2024

OPEN QUESTION: How to handle EMEA vs North America
OPEN QUESTION: how to handle azure gen1 image

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should another open question be how we handle multi-cloud images in Azure? Is govcloud in scope? China cloud? etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing is how we as an organization should ask the release engineering / spstrat team (assuming they're the ones owning the upload process) how to structure the images, specifically in Azure.

Azure is broken down as such:

  • publisher: consistent
  • offer: can make new offers, but typically you'd split fcos and rhcos here
  • version: the version of the rhcos vhd
  • sku: this is where things get interesting. Marketplace allows configuring different options the OS supports such as trusted launch, cloud-init, supports sriov, etc. Typically openshift minor releases will enable new functionality such as this, hence why it would be good to configure a new SKU for every minor version so we can support / enable these features for customers.

When configuring images in the marketplace, you configure them at a plan level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should another open question be how we handle multi-cloud images in Azure? Is govcloud in scope? China cloud? etc.

I think it's a good question, and one we will need to answer for the implementation, but I don't think it affects the overall enhancement. I'm happy to add it as an open question in the next update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sku: this is where things get interesting. Marketplace allows configuring different options the OS supports such as trusted launch, cloud-init, supports sriov, etc. Typically openshift minor releases will enable new functionality such as this, hence why it would be good to configure a new SKU for every minor version so we can support / enable these features for customers.

I have incorporated this into the enhancement. The sku will be variable, but deterministic.

@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 10, 2024
@patrickdillon
Copy link
Contributor Author

patrickdillon commented Oct 14, 2024

/remove-lifecycle rotten

1 similar comment
@patrickdillon
Copy link
Contributor Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 15, 2024
@patrickdillon
Copy link
Contributor Author

/test markdownlint

Copy link
Contributor

openshift-ci bot commented Oct 15, 2024

@patrickdillon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 0bb4c0b link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +178 to +184
$ az vm image list --publisher azureopenshift --offer aro4 --all -o table --sku 418
Architecture Offer Publisher Sku Urn Version
-------------- ------- -------------- ------- ------------------------------------------------- ---------------------
Arm64 aro4 azureopenshift 418-arm azureopenshift:aro4:418-arm:418.94.202410090804-0 418.94.202410090804-0
x64 aro4 azureopenshift 418-v2 azureopenshift:aro4:418-v2:418.94.202410090804-0 418.94.202410090804-0
x64 aro4 azureopenshift aro_418 azureopenshift:aro4:aro_418:418.94.202410090804-0 418.94.202410090804-0
```
Copy link
Member

@sdodson sdodson Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are URNs immutable or could azureopenshift:aro4:418-v2:418.94.202410090804-0 today be something different than azureopenshift:aro4:418-v2:418.94.202410090804-0 tomorrow? Ideally they'd be immutable either by Azure policy or our publishing pipeline policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally they'd be immutable either by Azure policy or our publishing pipeline policy

Yes they would be inherently immutable by the coreos stream publishing policy! 418.94.202410090804-0 refers to a specific build, so as long as the conventions for publishing the marketplace image (i.e. correct build version is used and specified) and the stream are practiced the image for that URN will always be the same.

authors:
- "@patrickdillon"
reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect"
- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, would like to have lgtm from the following folks, or a delegate of their choosing, before we merge.
@jlebon or @mike-nguyen from RHCOS
@bennerv from ARO
@yuqi-zhang from MCO
@bryan-cox from HCP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but doublechecking with some other HyperShift folks to see if they are good with this as well.

@sdodson
Copy link
Member

sdodson commented Oct 29, 2024

/approved
Specific to proposed OCP use of Azure 1p marketplace use we need to ensure we have guarantees that 1p images never get removed for any reason.

Comment on lines +279 to +285
Early feedback on this enhancement proposed splitting marketplace images into a separate file.
Multiple files solves the issues regarding holding the initial bump PR to wait for marketplace
images to be uploaded, but there are many assumptions throughout openshift that the stream is contained
in a single file, such as `openshift-install coreos print-stream-json`, the
[coreos configmap manifest](https://github.com/openshift/installer/blob/master/hack/build-coreos-manifest.go),
& perhaps more. I believe the current approach solves the problems we were trying to solve with this alternative
but in a way that requires less reworking of existing code and keeps `rhcos.json` as a single canonical file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, even so I still think that having it be a separate file would be cleaner. I may be wrong, but offhand it doesn't seem like it'd require that much work given how simple e.g. the print and configmap code is. One thing that helps is that this is new information, so only things that need to care about Marketplace images need to know how to access it.

Currently, plume cosa2stream is purely about transforming some JSON about a build into some other JSON schema. I think it's confusing to have the command now also do cloud queries to find out what the Marketplace images are. It also makes it non-deterministic. But also, having it be separate tools and files managed by separate teams is organizationally just easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern with using the separate file is that we would lose having a single canonical source for RHCOS images. We plan on moving all Azure installs to marketplace images, so for example, clients would need to know "I'm looking for Azure images so I don't look in rhcos.json like I do for all other images, I need to look in rhcos-marketplace.json because I know Azure uses marketplace images."

I'm not sure if there is a way to use json refs or another solution to mitigate this issue (i.e. so that even though the stream is contained in separate json files it can still essentially be deserialized as a single canonical source). I will definitely look into that more (& please lmk if you know).

I don't think plume cosa2stream is the right place--this is my ignorance of cosa. It might be ore that would be a more appropriate place for this to live, but I've had some difficulty pinpointing exactly how we publish the current Azure images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this more, I think my concern can be handled by (some sort of) an additional layer of abstraction, like the one I mentioned (json ref) or just a package for loading both streams. I will update the enhancement to use the additional file implementation.

Copy link
Member

@travier travier Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan on moving all Azure installs to marketplace images, so for example, clients would need to know "I'm looking for Azure images so I don't look in rhcos.json like I do for all other images, I need to look in rhcos-marketplace.json because I know Azure uses marketplace images."

Users are not supposed to look at the JSON file directly. They should really use the openshift-install coreos print-stream-json command and we can make that command combine or print the right images in the code. See: https://docs.openshift.com/container-platform/4.17/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-user-infra-machines-iso_installing-platform-agnostic

I agree with Jonathan that this should be a different file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced that command explicitly so that we would not create an interface with the rhcos.json file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users are not supposed to look at the JSON file directly.

we can make that command combine or print the right images in the code

@travier, +1 I see now that the separate json file is more of an implementation detail and we should be able to combine the separate files to maintain a single canonical source, which is my main concern

They should really use the openshift-install coreos print-stream-json
We introduced that command explicitly so that we would not create an interface with the rhcos.json file.

I'm not concerned about users, I'm concerned about openshift clients, such as hypershift. But I think they are looking at the configmap the installer creates, so again we should be able to handle this by combining both files when creating the configmap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That WFM. I think keeping it separate in-tree, but together in the cluster is a good balance.

"no-purchase-plan": {
"publisher": "RedHat",
"offer": "rhcos",
"sku": "rhcos",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sku is supposed to be something like 418

@travier
Copy link
Member

travier commented Nov 8, 2024

I stand by what I said in #1655 (comment). This does not reflect the flow we agreed on. We should have a discussion about this again.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 14, 2024
@patrickdillon
Copy link
Contributor Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 18, 2024
@patrickdillon
Copy link
Contributor Author

I have an implementation up in openshift/installer#9329 and coreos/stream-metadata-go@main...patrickdillon:stream-metadata-go:mkt-ext. I believe this is very much in line with @travier's suggested approach.

Working on #9329 has made things more concrete and brought forward some more open questions (e.g. we need to discuss fcos azure marketplace images). I will update the enhancement ASAP upon return from recharge.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants