Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstate sync pipeline #403

Merged
merged 9 commits into from
Jan 30, 2024
Merged

Reinstate sync pipeline #403

merged 9 commits into from
Jan 30, 2024

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented Jan 26, 2024

@joeflack4 @twhetzel

This PR fixes the sync pipeline - in the end, the error was small, but it took quite some needle in the haystack searching..

I would suggest to review this, merge it into #397 (this is where this PR is directed towards), and then merge #397 as a whole.

Once its done, I will rerun everything again.

Previously, the pipeline did not fail when `UST_BACKTRACE=full semsql make $*.db -P config/prefixes.csv` failed
@matentzn matentzn changed the base branch from main to lexmatch-revamp January 26, 2024 09:42
Copy link
Member Author

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeflack4 I made comments in all the relevant sections.

src/scripts/sync_subclassof.py Show resolved Hide resolved
src/scripts/sync_subclassof.py Show resolved Hide resolved
src/scripts/sync_subclassof.py Show resolved Hide resolved
src/scripts/sync_subclassof.py Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
.gitignore Show resolved Hide resolved
@matentzn matentzn marked this pull request as ready for review January 26, 2024 13:01
@matentzn matentzn changed the title Reinstate sync Reinstate sync pipeline Jan 26, 2024
.gitignore Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
@@ -422,6 +426,9 @@ $(MAPPINGSDIR)/rejected-mappings.tsv:
$(MAPPINGSDIR)/rejected-mappings-sssom.tsv: $(MAPPINGSDIR)/rejected-mappings.tsv
sssom parse $< -m metadata/mondo.sssom.config.yml --no-strict-clean-prefixes -o $@

tmp/mondo-ingest.owl: mondo-ingest.owl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion -> tmp/mondo-ingest.db: mondo-ingest.db

Details

Changes

(1) Change the prereq here to mondo-ingest.db

$(REPORTDIR)/%.subclass.added.robot.tsv $(REPORTDIR)/%.subclass.confirmed.robot.tsv $(REPORTDIR)/%.subclass.direct-in-mondo-only.tsv $(REPORTDIR)/%.subclass.added-obsolete.robot.tsv: tmp/mondo-ingest.db tmp/mondo.db tmp/mondo.sssom.tsv

(2) For the same target as above, remove $(TMPDIR) from this line

--mondo-ingest-db-path $(TMPDIR)/mondo-ingest.db

(2) Delete goal for tmp/mondo-ingest.owl

Rationale

We already have a mondo-ingest.owl. We can just make a mondo-ingest.db. There is not need I think for a duplicate file in tmp/. Also, the .gitignore is already set up for *.db so no worry about it getting committed.

I felt like we had already made this change months ago, but I think what happened was we had a similar situation for mergd.owl/.db and tmp/merged.owl/.db. We decided to stick with only 1 copy of those files (the tmp/ version).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like as few files as possible in the src/ontology dir -> hence that suggestion..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion -> mondo-ingest.owl/.db -> tmp/

How about this, then? mondo-ingest.owl is actually in the .gitignore. We could just move it and its .db into tmp/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Nico's addition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mondo-ingest.owl is in "gitignore", but it is used by "make deploy-release". We can reconsider this when we decide that we want to publish a db version of Mondo ingest for easier access in OAK by other parties.

Copy link
Contributor

@joeflack4 joeflack4 Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn
(1) I'm not talking about publishing a .db version. I'm just think we have needless duplication of goals / files here.

(2) By deploy-release (I don't see it) do you mean deploy-mondo-ingest? It's NP to me that it's in the .gitignore but also in a release. Usually that's the case for release artefacts.

(3) There are no prereqs for deploy-mondo-ingest. Not a huge deal but I suppose all the release artefacts should be listed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we decided to decouple deploy commands from everything as they are dependent on files after the are moved to the release directory. I can explain in a call how this works if you like details!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answers (3). I can just take your word for it.

That doesn't answer my questions (1) and (2) above. But we have bigger fish to fry if you don't wanna worry about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) deploy-mondo-ingest -> Yes, that is what I meant. It picks the release files from the release directory and uploads them to github (just did it yesterday in #417)

(1) Its a value judgement. I kinda dont like (at all), that ODK puts anything (any intermediate file) in src/ontology. That was a design mistake. It makes that folder, which has key confguration, feel cramped an unwieldy. I would prefer all files, or at least as much as possible, to be generated in tmp/ to be less cramped. I am not taking a super hard stance here though and can be convinced otherwise.

Copy link
Contributor

@joeflack4 joeflack4 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing me those answers!

I didn't realize ODK was doing src/ontology/$(ONT).owl but I see that now and that makes sense.

Base automatically changed from lexmatch-revamp to main January 29, 2024 05:27
Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@@ -422,6 +426,9 @@ $(MAPPINGSDIR)/rejected-mappings.tsv:
$(MAPPINGSDIR)/rejected-mappings-sssom.tsv: $(MAPPINGSDIR)/rejected-mappings.tsv
sssom parse $< -m metadata/mondo.sssom.config.yml --no-strict-clean-prefixes -o $@

tmp/mondo-ingest.owl: mondo-ingest.owl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Nico's addition here.

src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/scripts/sync_subclassof.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants