Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PURL-TYPE: golang] fix type spec regarding path segments #308

Open
jkowalleck opened this issue Jun 24, 2024 · 17 comments
Open

[PURL-TYPE: golang] fix type spec regarding path segments #308

jkowalleck opened this issue Jun 24, 2024 · 17 comments
Labels
Ecma specification Work on the core specification PURL capitalization PURL name component PURL namespace component PURL type definition Non-core definitions that describe and standardize PURL types type: golang Proposed new type as well as component discussions

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Jun 24, 2024

see PURL spec : https://github.com/package-url/purl-spec/blob/b33dda1cf4515efa8eabbbe8e9b140950805f845/PURL-SPECIFICATION.rst#rules-for-each-purl-component
see PURL-TYPE spec for golang :

purl-spec/PURL-TYPES.rst

Lines 300 to 314 in b33dda1

golang
------
``golang`` for Go packages:
- There is no default package repository: this is implied in the namespace
using the ``go get`` command conventions.
- The ``namespace`` and `name` must be lowercased.
- The ``subpath`` is used to point to a subpath inside a package.
- The ``version`` is often empty when a commit is not specified and should be
the commit in most cases when available.
- Examples::
pkg:golang/github.com/gorilla/context@234fd47e07d1004f0aed9c
pkg:golang/google.golang.org/genproto#googleapis/api/annotations
pkg:golang/github.com/gorilla/context@234fd47e07d1004f0aed9c#api

Problem

According to PURL-TYPE spec for golang, "The namespace and name must be lowercased."

This means, that all URL path-part from a hosted go module MUST be lowercased for PURL namespaces.
URL path-part are case-sensitive per definition.
Therefore, TYPE spec is not helpful, as it modifies URL path-part and renders it usable in namespaces, as it makes them PURLs indistinguishable, and it makes them PURLs unusable for package retrieval.

see also: google/deps.dev#93
see also: https://www.youtube.com/watch?v=Lts4NjHqKIw&t=1004s

Example

Module with the topic of preserving a thing:
hosted at https://example.com/pakages/Preserve
would have a purl pkg:golang/example.com/pakages/preserve.

Module with the topic of an event before serving a thing:
hosted at https://example.com/pakages/preServe
would have a purl pkg:golang/example.com/pakages/preserve.

Issue A: Both PURLs are the same, but the modules are not.
Issue B: none of the PURL namespace/name segments are usable to build the original/actual distribution/source URL from it.

Possible Solution

Option A: simply allow case-sensitivity

When converting URL to PURL namespace, then the host-part of the URL name MUST be lowercased, and the path-part of the URL segments MUST NOT be modified.

--- PURL-TYPES.rst
+++ PURL-TYPES.rst

   - There is no default package repository: this is implied in the namespace 
     using the ``go get`` command conventions. 
-  - The ``namespace`` and `name` must be lowercased. 
+  - The ``namespace`` and ``name`` must be the lowercased host-part of the distribution URL followed by the unmodified path-part of the distribution URL
   - The ``subpath`` is used to point to a subpath inside a package. 

Example:

  • URL https://packages.EXAMPLE.com/MyOrg/foO --> PURL pkg:golang/packages.example.com/MyOrg/foO
  • URL https://packages.example.com/ACME/foo --> PURL pkg:golang/packages.example.com/ACME/foo

In case the proposed solution above is considered a breaking change:
deprecate the existing PURL-TYPE golang and create a new PURL-TYPE go (see #67),
and define the PURL TYPE as proposed above.

Option B: no namespaces, all encoded name

this would be definitely a breaking change, so it requires deprecating TYPE purl, and come up with a reboot:

  1. PURL-TYPE go (see Go is called Go, not Golang #67)
  2. spec as follows:
    • The namespace must be empty
    • The name must be the lowercased host-part of the distribution URL followed by the unmodified path-part of the distribution URL
    • The subpath is used to point to a case-sensitive subpath inside a package.
    • The version is often empty when a commit is not specified and should be
      the commit in most cases when available.

Example:

  • URL https://packages.EXAMPLE.com/MyOrg/foO%26bar --> PURL pkg:golang/packages.example.com%2FMyOrg%2FfoO%2526bar
  • URL https://packages.example.com/ACME/foo --> PURL pkg:golang/packages.example.com%2FACME%2Ffoo

Please bare with me, I am just the person who happened to write this report, I do not know much about the golang ecosystem, but I know something about PURL.

@matt-phylum
Copy link
Contributor

I don't think this is right either. Go module names are case sensitive, including the first path element. Uppercase characters are forbidden for the first path element, so lowercasing is unnecessary for valid module names and can turn invalid module names into valid module names. It's easier and more accurate to just leave the module name how it is.

$ go get packages.EXAMPLE.com/MyOrg/foO
go: malformed module path "packages.EXAMPLE.com/MyOrg/foO": invalid char 'E' in first path element

@prabhu
Copy link

prabhu commented Jun 24, 2024

Option C: namespace must be empty with encoded names (any casing)

This closely mimics the capabilities supported for go module names. Since this is a breaking change, a new package type such as go, gopkg, or gomod is preferable to versioning the purl spec.

@matt-phylum
Copy link
Contributor

Isn't Option C the same as Option B?

Related to namespaces: #294

It's not possible to just make a new go package type to avoid versioning PURL. This would create distinct PURLs for both go and golang which refer to the same package, but only in certain contexts, and likely lead to unexpected and inconsistent normalization where some software translates golang PURLs into go PURLs and other software considers the normalized PURL to be a distinct package. This is a problem especially for software that tries to match PURLs from different sources.

@prabhu
Copy link

prabhu commented Jun 24, 2024

@matt-phylum, option B has a detailed specification about the name property. My proposal is not to have any opinion.

purl doesn't have a concept of versioning built-in, so needs both the producer and consumer agree on the exact version to follow. Distinct package types avoids this problem. The benefit is that it can be used as a precendent to improve npm, pypi etc.

@matt-phylum
Copy link
Contributor

Creating distinct package types avoids one problem by creating a bigger problem. Creating several slightly different types with slightly different behaviors defeats the purpose of having a standardized way of naming packages. If all possible type variations are valid simultaneously, all implementations need to support all ways to refer to packages relevant to that implementation. Existing software would not understand the new types until updated (similar to versioning PURL?). Humans working with PURLs will need to remember which rules apply to which types. Normalizing from one type to a preferred type to alleviate this issue would be a significant change to normalization that would cause issues with interoperability between products and compatibility with existing data.

I think removing the namespace for Go or other package types and instead putting a percent encoded path into the name, whether with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs and PURL implementations. There's no standard format for deconstructed PURLs so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding, resulting in the same serialized representation. It'd probably be best to do this across all package types at the same time so PURL implementations can be simplified by combining the two components instead of having an extra case for namespace+name combined.

@jkowalleck
Copy link
Member Author

jkowalleck commented Jun 24, 2024

re: #308 (comment)

[...] There's no standard format for deconstructed PURLs

Oh there is. see https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst

[...] so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding [...]

this would be against existing purl spec.
if all is a name, then the encoding is mandatory according to PURL spec.

Existing PURL spec: name MUST NOT include a / --> it is to be URL-encoded to %2F
Existing PURL spec: namespace can have as many / as they want ...

thing is: AFAIK go does not know any namespace (unlike php/composer and npm and others ...)
go only has package names, and I had the idea to use this fact to make things right.

[...] with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs [...]

you are completely wrong here. The opposite is the case:

  • Existing go PURLs have one to many namespace-segments and one name,
  • new go PURLs woudl have no namespace-segments and one name.

the namespace-segments and name are to be escaped per PURL spec - regardless of new or old go PURL. nothing changes here.

and to distinguish between new and old ... well the one has atleast one namespace-segment, the other does not.

and downstream usage example:
OLD: take the namespace-segments and the name -> concatenate both with / --> you've got the package dist url ...
NEW: take the namespace-segments(an empty list) and the name -> concatenate both with / --> you've got the package dist url ...
Seams not a big of a deal.


I just wanted to give ideas how this could be solved and how hard it might be.
I don't care for a specific solution. Furthermore, I don't even use go myself. I don't have any investment here.


Anyway, I do not want to alter the core PURL spec. All it takes is "fixing" the type spec.

@matt-phylum
Copy link
Contributor

thing is: AFAIK go does not know any namespace (unlike php/composer and npm and others ...)
go only has package names, and I had the idea to use this fact to make things right.

PHP nor NPM have namespaces either. The name of symfony/console is symfony/console. The name of @angular/cli is @angular/cli. The native tools always name the dependency with its leading component.

@jkowalleck
Copy link
Member Author

jkowalleck commented Jun 24, 2024

PHP nor NPM have namespaces either. The name of symfony/console is symfony/console. The name of @angular/cli is @angular/cli. The native tools always name the dependency with its leading component.

you are wrong here.

  • symfony is the vendor (doubles as namespace), console is the name.
  • @angular is the scope (doubles as namespace) cli is the name.

but all of this does not matter for this discussion here, sorry, please stick to the topic.


go, afaik, does not have a registry, so they dont have a vendor, nor scope, nor namespaces.
they have locations.

@matt-phylum
Copy link
Contributor

There's no difference between how NPM and PHP do/don't have namespaces and how Go does/doesn't have namespaces. In all of these cases, the name of the package in the native ecosystem contains slashes, and for PURL the native name is pulled apart into a namespace+name combination that results in the serialized form containing the native name.

@prabhu
Copy link

prabhu commented Jun 24, 2024

Creating distinct package types avoids one problem by creating a bigger problem. Creating several slightly different types with slightly different behaviors defeats the purpose of having a standardized way of naming packages. If all possible type variations are valid simultaneously, all implementations need to support all ways to refer to packages relevant to that implementation. Existing software would not understand the new types until updated (similar to versioning PURL?). Humans working with PURLs will need to remember which rules apply to which types. Normalizing from one type to a preferred type to alleviate this issue would be a significant change to normalization that would cause issues with interoperability between products and compatibility with existing data.

I think removing the namespace for Go or other package types and instead putting a percent encoded path into the name, whether with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs and PURL implementations. There's no standard format for deconstructed PURLs so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding, resulting in the same serialized representation. It'd probably be best to do this across all package types at the same time so PURL implementations can be simplified by combining the two components instead of having an extra case for namespace+name combined.

We already have this problem. For example, nixos can wrap a pypi package and build it slightly differently and have a similar package name that may or may not have the same vulnerabilities. Many OS distros also operate similarly.

@prabhu
Copy link

prabhu commented Jun 24, 2024

PHP, of course, has vendor.

2024-06-24_19-21-36

@matt-phylum
Copy link
Contributor

I don't see package namespaces in the screenshot.

The first arrow looks like it's pointing at "main Composer repository", but the repository is not related to the package name.

The other two arrows are pointing at package names.

As you can see, require takes an object that maps package names (e.g. monolog/monolog) to version constraints (e.g. 1.0.*).

https://getcomposer.org/doc/01-basic-usage.md#the-require-key

The package name consists of a vendor name and the project's name.

https://getcomposer.org/doc/01-basic-usage.md#package-names

Some package types do have namespaces.

  • alpm, apk, deb, rpm have namespaces, but they don't seem like they mean anything
  • bitbucket, github are debatable. Unlike a namespace, the full repository name includes the owner and repository name. Like a namespace, the same repository can exist with many different owners, but unlike a namespace, many repositories with the same name are completely unrelated. gitlab is similar, except that the project ID contains a GitLab namespace that contains slashes instead of a single owner which cannot.
  • cpan is overloaded such that a PURL with a namespace refers to a different kind of package than a PURL without a namespace
  • hex and maven are similar to non-namespace types like composer, but the native tools often deal with the components stored in the PURL namespace and PURL name as separate fields, so it makes sense to treat them as separate fields in PURL as well
  • swid has up to three components, but PURL only has room for namespace and name, so swid PURL namespaces may contain one slash

composer, docker, golang, huggingface, npm, swift create a PURL namespace by splitting the native package name/id on the last slash such that writing out the PURL in its canonical form gives the appearance of PURL using the native package name/id, despite PURL actually forcing a namespace+name.

nuget is actually similar to npm, but handled differently by PURL. NuGet packages usually have a name prefix, but NuGet uses periods as delimiters, and pkg:nuget/microsoft/extensions/dependencyinjection or pkg:nuget/microsoft.extensions/dependencyinjection look alien to NuGet users. Replacing the periods with slashes to fit into PURL when leaving them alone would work is unnecessary.

There are a few more I'm not sure about, but the rest forbid namespaces.

I think it would be a mistake to create a package type which normally puts slashes in its PURL name because it makes PURLs that are difficult for humans and it creates complications if namespaces are removed from the core specification (possible without breaking existing PURLs).

@jkowalleck
Copy link
Member Author

reminder: this is about the current golang PURL-TYPE.
this is not a general discussion about general ideas and likings. this is about solving an issue the go community is actually having right now, not some esoteric concepts of "humans reading PURL" or personal preference, nor about "but the other PURL-TYPEs do it this way...".

Each ecosystem has own requirements, each ecosystem is facing different standards and constraints.
And the current implementation for golang in PURL-TYPE does not adhere to the real world of go users.
Please read the original issue description and be helpful. :D

@prabhu
Copy link

prabhu commented Jul 4, 2024

@jkowalleck, we are seeing similar issues and potential workarounds across other package types, which is what we are trying to convey here. I think the next step could be for the core maintainers to digest the information and come up with something authoritative.

@jkowalleck
Copy link
Member Author

jkowalleck commented Jul 4, 2024

re: #308 (comment)

I see, but this does not help this particular problem.
If there was a larger issue with a wider scope, then this could be discussed in a meta-issue somewhere else, and it might lead to no consensus or a complete reboot of the project. see also #310

In the meantime, this particular issue for go people could be solved already, ...

PS: nuff said. will unfollow this issue, since i am not really affected as a non-go person ;-)

@tiegz
Copy link

tiegz commented Oct 17, 2024

re: Option B, and to @jkowalleck 's point about namespaces: #63 (comment)

@jkowalleck
Copy link
Member Author

Superseded by #338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ecma specification Work on the core specification PURL capitalization PURL name component PURL namespace component PURL type definition Non-core definitions that describe and standardize PURL types type: golang Proposed new type as well as component discussions
Projects
None yet
Development

No branches or pull requests

5 participants