Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cope with invalid hash algorithms in RECORD #179

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions src/installer/records.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,13 +192,15 @@ def from_elements(cls, path: str, hash_: str, size: str) -> "RecordEntry":
if not path:
issues.append("`path` cannot be empty")

hash_value: Optional[Hash] = None
if hash_:
try:
hash_value: Optional[Hash] = Hash.parse(hash_)
hash_value = Hash.parse(hash_)
if hash_value.name not in hashlib.algorithms_available:
Copy link
Contributor

@eli-schwartz eli-schwartz May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 376 / https://packaging.python.org/en/latest/specifications/recording-installed-packages/ requires that RECORD must be in hashlib.algorithms_guaranteed, using _available is not good enough when producing installed databases in site-packages.

But the wheel format itself (https://packaging.python.org/en/latest/specifications/binary-distribution-format/) doesn't say anything one way or another about it...

RECORD is a list of (almost) all the files in the wheel and their secure hashes. Unlike PEP 376, every file except RECORD, which cannot contain a hash of itself, must include its hash. The hash algorithm must be sha256 or better; specifically, md5 and sha1 are not permitted, as signed wheel files rely on the strong hashes in RECORD to validate the integrity of the archive.

So it disagrees with PEP 376 in at least two points:

  • hash all files, even pyc
  • md5 and sha1 are algorithms_guaranteed, but still not allowed

And therefore the PEP cannot be taken as authoritative on algorithms_guaranteed. But only for wheels. Once e extracted it does need to be in _guaranteed.

Fun.

...

In theory this would solve the platform-conditional concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me what the intended availability of installer.records is. Can I use it to parse the RECORD in a wheel file, or one in site-packages? Is the appropriate guard here to enforce this when creating new RECORD files only?

Experimental patch to validate this as part of destinations, where we know that any algorithm passed is definitely intended for creating a new installed RECORD file: 67da026.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what conclusions, if any, you reach from the above.

I am therefore politely saying "noted"... but not doing anything about it!

Please say if you have opinions about how this MR ought to look (or submit an MR, I'm fine with this one being overtaken by something better)

issues.append(f"invalid hash algorithm '{hash_value.name}'")
hash_value = None
Comment on lines +198 to +201
Copy link
Member

@pradyunsg pradyunsg May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be performing this validation in the validate method somehow, since this would prevent a RecordEntry from being created with a hash that isn't in algorithms_available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that it's desirable to be unable to create a RecordEntry with a bad hash algorithm, but I'm happy to be guided in another direction if you have a preference.

Yet another approach, doubling down on "invalid hash algorithms are invalid" would be to raise a ValueError from Hash.parse(), though the error message from that path isn't currently very informative

except ValueError:
issues.append("`hash` does not follow the required format")
else:
hash_value = None

if size:
try:
Expand Down
15 changes: 13 additions & 2 deletions src/installer/sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
)

from installer.exceptions import InstallerError
from installer.records import RecordEntry, parse_record_file
from installer.records import InvalidRecordEntry, RecordEntry, parse_record_file
from installer.utils import canonicalize_name, parse_wheel_filename

if TYPE_CHECKING:
Expand Down Expand Up @@ -277,7 +277,18 @@ def validate_record(self, *, validate_contents: bool = True) -> None:
)
continue

record = RecordEntry.from_elements(*record_args)
try:
record = RecordEntry.from_elements(*record_args)
except InvalidRecordEntry as e:
for issue in e.issues:
issues.append(
f"In {self._zipfile.filename}, entry in RECORD file for "
f"{item.filename} is invalid: {issue}"
)

# coverage on Windows and python < 3.10 claims that the next line is not
# reached, pragma to deal with this false positive.
continue # pragma: no cover

if item.filename == f"{self.dist_info_dir}/RECORD":
# Assert that RECORD doesn't have size and hash.
Expand Down
23 changes: 23 additions & 0 deletions tests/test_sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,3 +338,26 @@ def test_rejects_record_validation_failed(self, fancy_wheel):
),
):
source.validate_record()

def test_rejects_record_containing_unknown_hash(self, fancy_wheel):
with WheelFile.open(fancy_wheel) as source:
record_file_contents = source.read_dist_info("RECORD")

new_record_file_contents = record_file_contents.replace("sha256=", "sha=")
replace_file_in_zip(
fancy_wheel,
filename="fancy-1.0.0.dist-info/RECORD",
content=new_record_file_contents,
)

with (
WheelFile.open(fancy_wheel) as source,
pytest.raises(
WheelFile.validation_error,
match=(
"In .+, entry in RECORD file for .+ is invalid: "
"invalid hash algorithm 'sha'"
),
),
):
source.validate_record(validate_contents=True)