Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool for processing mapping clusters (boomer reports) #440

Merged
merged 18 commits into from
Feb 2, 2023
Merged

Conversation

cmungall
Copy link
Collaborator

@cmungall cmungall commented Jan 31, 2023

Currently tightly coupled with boomer.

Added a datamodel for boomer results: https://github.com/INCATools/ontology-access-kit/blob/0a1d53cfe0732f8834e703c18683ceabbf366eb1/src/oaklib/datamodels/mapping_cluster_datamodel.yaml

CLI:

Usage: boomerang export [OPTIONS] INPUT_REPORT

  Exports mappings from a boomer report.

  boomerang export tests/input/boomer-example.md

Options:
  -L, --minimum-confidence FLOAT  Do not show mappings with lower confidence
  -H, --maximum-confidence FLOAT  Do not show mappings with higher confidence
  --help                          Show this message and exit.

Example:

boomerang export tests/input/boomer-example-larger.md  -H 0.6
# curie_map: {}
# license: UNSPECIFIED
# mapping_set_id: temp
subject_id	subject_label	predicate_id	object_id	object_label	mapping_justification	confidence
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://identifiers.org/mesh/D008202	MESH:D008202	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:closeMatch	http://identifiers.org/snomedct/400178008	SCTID:400178008	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://identifiers.org/snomedct/254836000	SCTID:254836000	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://purl.obolibrary.org/obo/NCIT_C8965	Lymphangioma	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://www.orpha.net/ORDO/Orphanet_2415	Orphanet:2415	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://linkedlifedata.com/resource/umls/id/CN201700	UMLS:CN201700	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0002013	lymphangioma	skos:exactMatch	http://purl.obolibrary.org/obo/DOID_1475	DOID:1475	semapv:CompositeMatching	0.5000000000000002
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	http://purl.obolibrary.org/obo/DOID_0090142	DOID:0090142	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:closeMatch	http://linkedlifedata.com/resource/umls/id/C0268616	UMLS:C0268616	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	http://purl.obolibrary.org/obo/NCIT_C129070	Cystathioninuria	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:closeMatch	http://linkedlifedata.com/resource/umls/id/C0220993	UMLS:C0220993	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	http://linkedlifedata.com/resource/umls/id/C3495552	UMLS:C3495552	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	http://identifiers.org/snomedct/13003007	SCTID:13003007	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	https://omim.org/entry/219500	OMIM:219500	semapv:CompositeMatching	0.5
http://purl.obolibrary.org/obo/MONDO_0009058	cystathioninuria	skos:exactMatch	http://www.orpha.net/ORDO/Orphanet_212	Cystathioninuria	semapv:CompositeMatching	0.5

note: cluster IDs pending INCATools/boomer#332

Currently tightly coupled with boomer.

See also INCATools/boomer#332
@cmungall cmungall changed the title Tool for processing mapping clusters. Tool for processing mapping clusters (boomer reports) Jan 31, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2023

Codecov Report

Base: 80.03% // Head: 80.05% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (0e4c4df) compared to base (24cd1d1).
Patch coverage: 79.59% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #440      +/-   ##
==========================================
+ Coverage   80.03%   80.05%   +0.02%     
==========================================
  Files         195      201       +6     
  Lines       21501    22235     +734     
==========================================
+ Hits        17208    17800     +592     
- Misses       4293     4435     +142     
Impacted Files Coverage Δ
src/oaklib/datamodels/cross_ontology_diff.py 88.60% <ø> (ø)
src/oaklib/datamodels/lexical_index.py 96.81% <ø> (ø)
src/oaklib/datamodels/mapping_rules_datamodel.py 81.37% <ø> (ø)
src/oaklib/datamodels/obograph.py 88.48% <ø> (ø)
src/oaklib/datamodels/oxo.py 90.06% <ø> (ø)
src/oaklib/datamodels/search_datamodel.py 77.82% <ø> (ø)
.../oaklib/datamodels/summary_statistics_datamodel.py 84.02% <ø> (+0.23%) ⬆️
src/oaklib/datamodels/taxon_constraints.py 88.02% <ø> (ø)
src/oaklib/datamodels/validation_datamodel.py 82.17% <ø> (ø)
src/oaklib/datamodels/value_set_configuration.py 89.62% <ø> (ø)
... and 39 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@matentzn
Copy link
Contributor

Will we switch the input of boomerang to the json file we discussed yesterday?

@cmungall
Copy link
Collaborator Author

Will we switch the input of boomerang to the json file we discussed yesterday?

Like everything in OAK, this has a linkml datamodel that can be exported in all the usual formats. All that would need to be done here is to follow the get_writer pattern in the main cli, @hrshdhgd can do this

@cmungall
Copy link
Collaborator Author

cmungall commented Jan 31, 2023

I added a first pass at INCATools/boomer#334

The format is a little custom but this is easily changed

For most ontologies where xref means exact we can do:

boomerang compare tests/input/boomer-fake-go-example.md -i tests/input/go-nucleus.db --reject-non-exact --promote-xref-to-exact -L 0.95

and get back:

type info confidence predicate_id subject_id object_id
NEW None 0.95 skos:exactMatch GO:0005634 FAKE:1
REJECT skos:closeMatch 0.95 skos:exactMatch GO:0005634 Wikipedia:Cell_nucleus
OK None 0.95 skos:exactMatch GO:0005773 Wikipedia:Vacuole

if you store non-exacts in the source ontology then the output is a little more complex, and this is compounded by the fact that OAK doesn't yet auto-promote hacky xref annotations as used in mondo/go. I recommend these are switched to skos at source (cc @balhoff @hrshdhgd @matentzn).

If skos annotations are stored directly then just use the plain options, and any CONFLICT line will report a diff between predicate that needs to be enacted

next steps:

  • export as sssom in separate accept/reject files
  • export as kgcl?

@cmungall cmungall merged commit 6016c73 into main Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants