Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place url-scm download files in .bob-download #606

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rhubert
Copy link
Contributor

@rhubert rhubert commented Jan 1, 2025

Separate the compressed input files from the extracted files in the workspace in preparation of bundling the sources.

Copy link

codecov bot commented Jan 1, 2025

Codecov Report

Attention: Patch coverage is 92.59259% with 8 lines in your changes missing coverage. Please review.

Project coverage is 88.86%. Comparing base (2d829b4) to head (c619d27).

Files with missing lines Patch % Lines
pym/bob/scm/url.py 92.30% 8 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #606   +/-   ##
=======================================
  Coverage   88.86%   88.86%           
=======================================
  Files          48       48           
  Lines       15474    15551   +77     
=======================================
+ Hits        13751    13820   +69     
- Misses       1723     1731    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rhubert rhubert force-pushed the urlscm-bob-download branch from e1ec2c1 to bd6d56c Compare January 1, 2025 20:36
@rhubert rhubert mentioned this pull request Jan 4, 2025
@jkloetzke
Copy link
Member

Uhh, changing the stable-variant-ids test is a hinting that it breaks backwards compatibility. Moving the downloaded file into a subdirectory could break existing recipes. Even if the file is extracted, the recipe can still rely on the file being present today. So at least we must have a policy to guard this.

I would propose to have the downloaded file not in a magic hidden directory but instead completely outside of workspace. We could have a downloads next to workspace that stores the downloaded file and the canary. That would make it completely transparent. No need to exclude the directory from hashing and the workspace appears clean...

@rhubert
Copy link
Contributor Author

rhubert commented Jan 6, 2025

I would propose to have the downloaded file not in a magic hidden directory but instead completely outside of workspace.

I also thought about this. The main reason for placing it inside was the attic logic for workspaces. I'm unsure how to thread the download-folder files if the recipe changes. Simply delete it? Or do we need some attic logic there as well?

@jkloetzke
Copy link
Member

I would propose to have the downloaded file not in a magic hidden directory but instead completely outside of workspace.

I also thought about this. The main reason for placing it inside was the attic logic for workspaces. I'm unsure how to thread the download-folder files if the recipe changes.

Good point. Indeed, this will require some extra logic to keep the downloads directory from collecting garbage. But this will probably enable further optimizations in the future. If the downloaded file is already available locally (e.g. when using a preMirror), we can directly extract such archives in the future without even copying the file in the workspace.

Simply delete it? Or do we need some attic logic there as well?

I think we can simply delete it. The "downloads" directory probably needs to have the same structure as the "workspace". This is required anyway because there might be multiple url SCMs in the same workspace. I guess adding a postAttic method to SCMs which is called in the attic code path can simply delete the downloaded file.

@rhubert rhubert force-pushed the urlscm-bob-download branch from bd6d56c to 81a55e9 Compare January 14, 2025 12:09
Copy link
Member

@jkloetzke jkloetzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Just some minor nits...

pym/bob/scm/url.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
doc/manual/policies.rst Outdated Show resolved Hide resolved
pym/bob/scm/url.py Outdated Show resolved Hide resolved
Comment on lines 792 to 793
if self.__separateDownload:
downloadDestination = os.path.abspath(os.path.join(workspace, "..", "download", self.__dir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the os.path.abspath really necessary? If true, then I would appreciate a comment why. It's somewhat unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for abspath was, that os.path.exists has some trouble if the path contains ..:

>>> import os
>>> os.path.exists('foo/bar/../..')
False

however I think using os.path.normpath instead of abspath is the cleaner solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trouble comes from the fact that it fails if some of the path elements does not exist. Like for example if the user removed workspace from dev/[...]/1/workspace/../download then the path really does not exist. So I think you're right, os.path.normpath is more appropriate. Still, I would appreciate a comment because it's not obvious. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good explanation :) I looked into the documentation but couldn't find anything for os.path.exists. But I found a note for os.makedirs saying it's also confused by ... So I added normalize there as well. And a comment ;)

@rhubert rhubert force-pushed the urlscm-bob-download branch 2 times, most recently from d8004cb to 3c2f057 Compare January 21, 2025 06:23
rhubert and others added 6 commits January 21, 2025 07:33
Separate the downloaded files from the extracted files by downloading
them into a `download` directory next to the workspace. Also the canary is
generated there. The Gzip and XZ-Extractor always extract the files into
the directory of the compressed file. Therefor the compressed files is
copied into the workspace-directory first. By removing `-k`the
compressed files are no longer kept.

To trigger a attic move of old workspaces a version information is added
to the url-scm spec.
Delete the download directory if the scm-workspace is moved to attic.
@rhubert rhubert force-pushed the urlscm-bob-download branch from 3c2f057 to c619d27 Compare January 21, 2025 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants