Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure to support backward-compatibility check #257

Merged
merged 40 commits into from
Jul 29, 2024

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Jul 23, 2024

Description

This PR implements the infrastructure to support backward-compatibility testing and, more generally, policies applied to multi-version registries.

The following command checks the current registry (main branch in the official repo) with a baseline registry (here, the archive v1.26.0) against a package of rules (policies) defined in the file compatibility_check.rego:

weaver registry check \
--registry https://github.com/open-telemetry/semantic-conventions.git[model] \
--baseline-registry https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.26.0.zip[model] \
--policy compatibility_check.rego

The following screenshot is the output of this command if we artificially remove the file registry/deprecated/container.yaml from the main branch.

Screenshot 2024-07-26 at 4 56 39 PM

Changes

  • Extend the registry path format to support local and remote archives in addition to local folders and git repositories. The format will also support the definition of sub-folders. Motivation: with this PR, we now have two types of registries. Without a registry path format, we would need to add many parameters to support all combinations (folder, git-url, git-archive, folder, tag).
  • Unify RegistryPath (merge the two existing implementations defined in the weaver and weaver_semconv crates).
  • Document CLI usage for the new registry path format.
  • Add --baseline-registry in addition to --registry (both will support the same syntax for the registry path). The --registry-git-sub-dir will be marked as deprecated.
  • Implement support for local and remove git archives (supported formats: .tar.gz, .zip).
  • Replace Weaver Cache with the more general concept of RegistryRepo supporting all the variants of RegistryPath.
  • Integrate --baseline-registry with the policy engine. In particular, we must decide how to expose the two registries in Rego.
  • Create standard SemConv Rego rules and function to ease the creation of semconv policies.
  • Create examples of policies to detect backward-compatibility issues.

lquerel and others added 30 commits July 5, 2024 21:51
# Conflicts:
#	crates/weaver_forge/src/config.rs
#	crates/weaver_forge/src/extensions/case.rs
#	crates/weaver_forge/src/lib.rs
src/registry/check.rs Fixed Show fixed Hide fixed
src/registry/check.rs Fixed Show fixed Hide fixed
@lquerel lquerel changed the title [WIP] Infrastructure to support backward-compatibility check Infrastructure to support backward-compatibility check Jul 27, 2024
@lquerel lquerel marked this pull request as ready for review July 27, 2024 00:02
@lquerel lquerel requested a review from jsuereth as a code owner July 27, 2024 00:02
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. A couple nits - Predominately I think next time the Git-Url / path reference code should be a separate PR form the policy engine changing code.

THink you can close a few open issues for this one :)

Self::create_parent_dirs(&valid_entry_path, archive_filename)?;
// Unpack returns an Unpacked type containing the file descriptor to the
// unpacked file. The file descriptor is ignored as we don't have any use for it.
_ = entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for rust - Maybe we should open a FR - that #[must_use] for Result where you ? away a () doesn't cause a warning, and then you can map_error(..) and something like .ignore_result()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but I think a templated #[must_use] function returning a Result<T, Error> with T=() doesn't emit a warning.

I like the idea of the .ignore_result()? if we can specify a parameter &str representing the justification.

crates/weaver_cache/src/lib.rs Show resolved Hide resolved
crates/weaver_cache/src/registry_path.rs Show resolved Hide resolved
GitRepo {
/// URL of the Git repository
url: String,
/// Tag of the Git repository (NOT YET SUPPORTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be tag, branch or commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed refspec and updated the comment.

crates/weaver_cache/src/registry_path.rs Outdated Show resolved Hide resolved
// Add the standard semconv policies
// Note: `add_policy` the package name, we ignore it here as we don't need it
_ = engine
.add_policy("defaults/rego/semconv.rego", SEMCONV_REGO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only be added for a particular phase of the engine? Aren't these symbols meaningless in before_resolution or are they meant to be applied there too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, some of the rules/functions definitions could be used in any phases.

src/util.rs Outdated
@@ -119,9 +102,19 @@ pub(crate) fn check_policy_stage<T: Serialize>(
policy_stage: PolicyStage,
policy_file: &str,
input: &T,
data: &[T],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If you want data to actually be generic, you need to use a different type. Right now input AND the array of data must be the same type.

At a minimum, you should have:

fn check_policy_stage<T: Serialize, U:Serialize>(
    policy_engine: &mut Engine,
    policy_stage: PolicyStage,
    policy_file: &str,
    input: &T,
    data: &[U],

However, if you want data to not be monomorphic, you'd need to use an HList (likely k-list) or tuple-type structure to preserve the original types.

I don't think this is what you want at all. For now, at least, I'd use a different generic for data. In the future, you may want to use a different type to encapsulate serialization more generically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I made the changes.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 29, 2024

Overall LGTM. A couple nits - Predominately I think next time the Git-Url / path reference code should be a separate PR form the policy engine changing code.

THink you can close a few open issues for this one :)

Yes, creating two PRs was part of the initial intentions, but it got lost in the heat of the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants