Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow collection builds and caching build mid-stage dependencies (documents already compiled) #944

Open
ronaldtse opened this issue Nov 26, 2024 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@ronaldtse
Copy link
Contributor

From @ReesePlews:

hi @ronaldtse when the two plateau documents (sources/001-v4, sources/002-v4) are built on github, it tables about 30-40 minutes for the documents (github-pages) artifact and then about another hour for the collection to build. if we need to wait until both the github-pages and collection artifacts are created it takes a very long time.

my questions are:

what is the "collection" used for? if they are presently not being used, can we disable the generate of the collection artifact?

if the generate of the collection is essential, is it then ok to download the github-pages artifact during the generate of the collection artifact?

please let me know. thank you.

There are 2 issues here:

  1. Performance of a collection consisting of the same documents (> 1 hour) is much slower than the documents compiled singly (30-40 mins).
  2. Since we already have the individually compiled documents, we should re-use them for building the collection.

The first question is for @opoudjis .

The second is for @opoudjis and also @CAMOBAP (if this is possible).

Thanks!

@ronaldtse ronaldtse added the question Further information is requested label Nov 26, 2024
@opoudjis opoudjis self-assigned this Nov 29, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Nov 29, 2024
@opoudjis
Copy link
Contributor

We already have the capacity to do this: if the collection spec files name .xml files and not .adoc files, and those files have already been compiled in the place it is looking for them, it will not recompile them. We've already done this for ISO-10303.

@opoudjis
Copy link
Contributor

Performance of a collection consisting of the same documents (> 1 hour) is much slower than the documents compiled singly (30-40 mins).

I spent three weeks optimising collection compilation, and got a factor of two speedup. There will not be another comparable optimisation. The main optimisation on the horizon is the removal of the duplicate Semantic XML, which is dependent on the refactoring I am now doing.

@ReesePlews
Copy link

thank you both for the discussion. in my understanding the collection is something generated for a final deliverable. would there be a way to turn it off -- stop it from being generated on every PR, and only turn it on when your document is ready for a final deliverable? perhaps a switch can be added to the document.adoc file much like the :mn-output-extensions: is used for?

@opoudjis opoudjis moved this from 🆕 New to 🏔 High priority in Metanorma Dec 12, 2024
@opoudjis
Copy link
Contributor

It is much more complex than that, as the collection build file is completely separate to the document build file. It is very far from a simple one-line change: collections are a COMPLETELY different artefact.

It was @ronaldtse ’s decision to switch to collection processing for Plateau, and I am obligated to refer you to him for any changes to that build file.

@ReesePlews
Copy link

hello @opoudjis thank you for the additional comment.

the collection artifact was not produced in a recent (Dec 23rd/24th) github generate of the document. i think that is correct, (as i mentioned earlier) i dont see a reason for generating the collection artifact when the document content is deep in draft revisions. it was adding significant processing time to the generate of the document. when the collection is requred there should be a way to either turn on generation of that output or even a separate sources folder/document.adoc file to use may work reasonably well.

happy to discuss my comments more with @ronaldtse if helpful. for the current plateau work, i am not seeing a requirement for collection functionality, but perhaps i am unclear on the benefits of that functionality.

@opoudjis
Copy link
Contributor

opoudjis commented Dec 25, 2024

I think in another issue (I had a backlog of 900 of them), @ronaldtse has already temporarily disabled collection processing in Plateau, so this is a non-issue for now, and collections will indeed only be brought back once the document is finalised, and ready to be passed to Firelight. @ronaldtse could you confirm? I can't even find the issue any more.

In which case, this issue is already resolved.

@ReesePlews
Copy link

@opoudjis i am totally on-board with your reasoning on this and the current settings that have been implemented at this time, at least in my project. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: 🏔 High priority
Development

No branches or pull requests

3 participants