Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new source hashing methods: content_sha256, content_sha384, content_sha512 #5277

Merged
merged 38 commits into from
Jan 8, 2025

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Apr 12, 2024

Description

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@jaimergp jaimergp requested a review from a team as a code owner April 12, 2024 15:41
@jaimergp jaimergp marked this pull request as draft April 12, 2024 15:41
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 12, 2024
Copy link

codspeed-hq bot commented Apr 12, 2024

CodSpeed Performance Report

Merging #5277 will not alter performance

Comparing jaimergp:content-hash (33eb089) with main (b8724c6)

Summary

✅ 5 untouched benchmarks

@wolfv
Copy link
Contributor

wolfv commented Apr 15, 2024

I think this is cool. It would also work nicely with the new proposal for "rendered recipes" (conda/ceps#74).

On that note - should we continue adding features to conda-build without any standardization (e.g. CEP) process?

@jaimergp
Copy link
Contributor Author

should we continue adding features to conda-build without any standardization (e.g. CEP) process?

I'm planning to submit a CEP. I opened this draft to explore what kind of things are needed for a stable yet robust logic, cross platform. Things like permissions and so on don't translate well to Windows.

@wolfv
Copy link
Contributor

wolfv commented Apr 15, 2024

Awesome. Yeah, I also recently looked at a few content hash implementations in Rust but didn't find anything super convincing yet. There are a bunch though (https://crates.io/search?q=content%20hash)

@jaimergp
Copy link
Contributor Author

So far the scheme I followed looks a lot like https://github.com/DrSLDR/dasher?tab=readme-ov-file#hashing-scheme. Things to standardize would be how the tree is sorted, the normalization of the path, the separators (to prevent this), and the allowed algorithms.

I've seen a few merkle tree based packages but we don't need all the proof stuff, or leaf querying; just comparing the root hash.

Maybe it could be implemented in a recursive way that doesn't involve obtaining the whole file tree beforehand if that increases performance or simplifies implementation elsewhere. IMO this feels like one of those CEPs that does require prototyping first to see which things have to be standardized.

@jaimergp
Copy link
Contributor Author

pre-commit.ci autofix

@jaimergp jaimergp changed the title add content_sha256 hash checks Add new source hashing methods: content_md5, content_sha1, content_sha256 Nov 20, 2024
@jaimergp
Copy link
Contributor Author

I would add a CLI option in rattler-build that would look something like [...]

Hm, true, we could add a little subcommand here to make it easier. Although honestly, I usually run conda-build, wait for the hashing mismatch error, and then copy the correct one 😬 (That's why I amended the text in the errors hah).

beckermr
beckermr previously approved these changes Nov 26, 2024
@schuylermartin45
Copy link
Contributor

schuylermartin45 commented Nov 26, 2024

I don't think it is wise to add support for two hashing algorithms with known vulnerabilities in 2024. Although it may be unlikely, that smells like an avenue for a supply chain attack to me.

I think if we were to support multiple hashing algorithms, we should support algorithms that are still deemed viable and secure, like the other SHA-* bit lengths.

@jaimergp
Copy link
Contributor Author

#4793 will probably land soon. I'll update this branch with it once it reaches main and then remove the weak hashes. If we are that concerned about md5 and sha1 being used, we should also study the possibility of deprecating them or at least warning about them in the logs.

@beckermr
Copy link
Contributor

We'll need a lot of advanced notice to deprecate them on conda-forge. For now we should probably add a lint + minimigrator to move to sha256.

@wolfv
Copy link
Contributor

wolfv commented Nov 26, 2024

I am not concerned, I just don't understand why we want to add them? IMO it doesn't add value. Having the MD5 hash available for the regular hash makes sense because some pacakges might publish the known-good value (and that can be used in the recipe), but for something that we have invented ourselves I don't see a use-case where it is justified to use anything other than the best available hash.

@jaimergp jaimergp changed the title Add new source hashing methods: content_md5, content_sha1, content_sha256 Add new source hashing methods: content_sha256, content_sha384, content_sha512 Nov 26, 2024
beckermr
beckermr previously approved these changes Nov 27, 2024
Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We should probably vote on the CEP before merging.

@jezdez
Copy link
Member

jezdez commented Nov 27, 2024

Setting as blocked on the CEP vote

conda_build/utils.py Show resolved Hide resolved
conda_build/utils.py Outdated Show resolved Hide resolved
conda_build/utils.py Outdated Show resolved Hide resolved
conda_build/utils.py Outdated Show resolved Hide resolved
@jaimergp
Copy link
Contributor Author

@jezdez, the CEP passed, so I guess this review can be dismissed now.

@jaimergp jaimergp requested a review from a team December 21, 2024 10:54
beckermr
beckermr previously approved these changes Dec 21, 2024
Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

jezdez
jezdez previously approved these changes Jan 6, 2025
conda_build/utils.py Outdated Show resolved Hide resolved
@jaimergp jaimergp dismissed stale reviews from jezdez and beckermr via 6fed084 January 6, 2025 14:36
@jaimergp
Copy link
Contributor Author

jaimergp commented Jan 6, 2025

CI errors are unrelated. Investigating in #5577.

@jaimergp jaimergp enabled auto-merge (squash) January 7, 2025 23:24
@jaimergp jaimergp merged commit caa6d2f into conda:main Jan 8, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

Better hashing of sources
7 participants