Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bundle #586

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

WIP: bundle #586

wants to merge 8 commits into from

Conversation

rhubert
Copy link
Contributor

@rhubert rhubert commented Sep 23, 2024

Bundle the results of all checkoutSteps needed to build a package and provide a bundle-config file to be able to rebuild the package only with the bundle.

Such a bundle can be used to build on a air-gapped system or to archive all sources of a build. When building from the bundle only the bundle is extracted but no other checkoutScripts are run.

There is one issue with the actual URL-Scm extraction. The source workspace contains both, the original downloaded file and the extracted sources. This unnecessarily doubles the size of the bundle and - since the bundle-extraction uses the UrlScm as well - produces a different workspace hash when the bundle is extracted. That's why I changed to download the original file into workspace/../_download where also the .extracted file is placed. This change makes the unittest-failing ATM as the ../_download folder is always the same when using a tempDir.

I'm not sure how to proceed here:

  • fix the unit-tests
  • use a workspace folder (e.g. workspace/.bob-download/) for the downloaded files and exclude this directory when hashing / bundling?
  • leave the download location as is and ignore the downloaded + .extracted file?
  • ...?

@jkloetzke
Copy link
Member

I would argue that the tarball download optimization is some welcome but unrelated optimization. I would move it to some separate PR that should probably be merged first.

Reusing the URL SCM for the bundles is IMHO not the right approach. It should instead work like the archive stuff. Right now binary artifacts are used for saving or restoring package steps. What we need here is to save and restore checkout steps. In the best case we can build on the archive module and reuse most code from there.

From a more general angle: should this work for indeterministic checkouts too? I would argue against this and only bundle deterministic checkouts. But if there is a good reason to decide otherwise I'm open to it. It's just that my gut feeling is that it will get nasty to get all corner cases correct...

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 45.07042% with 78 lines in your changes missing coverage. Please review.

Project coverage is 88.39%. Comparing base (3277746) to head (853de76).

Files with missing lines Patch % Lines
pym/bob/bundle.py 26.74% 63 Missing ⚠️
pym/bob/input.py 67.74% 10 Missing ⚠️
pym/bob/builder.py 71.42% 4 Missing ⚠️
pym/bob/cmds/build/build.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #586      +/-   ##
==========================================
- Coverage   88.79%   88.39%   -0.41%     
==========================================
  Files          48       49       +1     
  Lines       15141    15279     +138     
==========================================
+ Hits        13445    13506      +61     
- Misses       1696     1773      +77     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This filter can be used by specific Archive to filter irrelevant files.
The default implementation is to keep all files.
'upload' and 'download' did not fit for all archives.
This special archive is used to bundle checkoutWorkspaces into a single
tar-file.
And use the builder functions to enable and finish bundling.
@rhubert
Copy link
Contributor Author

rhubert commented Jan 4, 2025

I added a new implementation using the archive stuff. Since adding files to the final bundle using tarfile can not be done in parallel this finalization step is necessary. Maybe this could be optimized somehow but it seams to be impossible to synchronize this asyncio stuff. It's either unable to pickle asyncio.future objects, or the lock is generated in the wrong loop. 🤷
With this finalization method it's somehow simple and works - I don't think time is that much relevant when a bundle is packed.

To get the blackbox test working #606 is required. As of today I haven't tested bundling / unbundling a larger, real world project. I'll do this in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants