Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Bug: Synonym Sync: Duplicate rows - build #723

Draft
wants to merge 1 commit into
base: bugfix-syn-sync-dupe-rows-684
Choose a base branch
from

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Dec 16, 2024

@joeflack4 joeflack4 changed the title Bug: Synonym Sync: Duplicate rows #684 - build DO NOT MERGE: Bug: Synonym Sync: Duplicate rows - build Dec 16, 2024
@joeflack4 joeflack4 self-assigned this Dec 16, 2024
@joeflack4 joeflack4 added the build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes label Dec 16, 2024
@joeflack4 joeflack4 marked this pull request as draft December 16, 2024 22:21
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this has nothing to do with this PR, and I'm sure this has been observed elsewhere, but I'm wondering why:

mondo-omim-genes.robot.tsv has only 20 lines changed, but mondo-omim-genes.robot.owl has over 30k.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, what is the diff? The most typical reason for huge diffs are genid s, so generated identifiers for anonymous expression necessary for RDF XML

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn Ah, I see, you are correct. These are bnode identifiers.

     <owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0013913">
-        <rdfs:subClassOf rdf:nodeID="genid5482"/>
+        <rdfs:subClassOf rdf:nodeID="genid5478"/>
     </owl:Class>
-    <owl:Restriction rdf:nodeID="genid5482">
+    <owl:Restriction rdf:nodeID="genid5478">
         <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0004003"/>
         <owl:someValuesFrom rdf:resource="http://identifiers.org/hgnc/11528"/>
     </owl:Restriction>
     <owl:Axiom>
         <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0013913"/>
         <owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
-        <owl:annotatedTarget rdf:nodeID="genid5482"/>
+        <owl:annotatedTarget rdf:nodeID="genid5478"/>
         <oboInOwl:source>OMIM:614840</oboInOwl:source>
     </owl:Axiom>

Not the hugest problem, but would make reviews easier if these IDs were deterministic. I have some Python code to do that in the OMIM pipeline, but that wouldn't be helpful here. Not that we have to worry about it now, but do you think there's a way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, switch to functional syntax in ROBOT! convert -f ofn

Copy link
Contributor Author

@joeflack4 joeflack4 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. But not in RDF/XML.
I have begun to take a great liking for OFN now not because of its syntax, but because of its determinism in general, and the ease of diffing it, for that reason as well as fewer lines. This is now a regular part of some of my workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twhetzel This file has nothing to do with this PR. But any idea why this has so many new rows?

Note that the other OMIM unmapped files do not have such a large diff:

docs/reports/unmapped_omim.md                      |      6 +
.../lexmatch/mondo-only/unmapped_omim_mondo.tsv    |      7 +-
src/ontology/lexmatch/unmapped_omim_lex.tsv        |      2 + 
src/ontology/reports/omim_unmapped_terms.tsv       |      6 +
rc/ontology/unmapped/omim-unmapped.owl             |  45170 +++++-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General review

Other than some OMIM related oddities not related to this PR, LGTM!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff with OMIM is caused by: comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants