Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Which terms outside the current term scopes are xref'd? #35

Merged
merged 17 commits into from
Aug 17, 2022

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Jul 12, 2022

Updates

    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Add: Makefile: Added .PHONY target alias for convenience for feature

commit c2dc9fd9c82e203eace2649cc07cd781551c1d93
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Add: README.md to src/ontology/reports: Describes files.
    - Add: Target to combine each of these 'by ontology' reports.
    - Update: The 'summary' file for each ontology generated is now part of the target.
    - Delete: Anything to do with 'excluded terms in Mondo classes'. No longer in scope for this feature.
    - Delete: Anything to do with 'exclusion table' input. Granularity of using the component and mirror
     signature files is necessary.
    - Delete: Unused functions replaced by package 'curies'
    - Update: Raise exception when SPARQL returns no results.
    - Update: Utilizing package: curies
    - Update: Misc: (i) Simplified things, removing parts of code that were not necessary or useful, (ii) Updated some code comments, (iii) Minor touch-ups to other parts of makefile.
    - Update: doid.yml config: Added a prefix
    - Add: Committed problematic exclusion xref tables to reports/.
    - Update: Now filters to show only problematic exclusions, e.g. leaves only rows where `n_in1_notIn2_in3` is True.
    - Bugfix: Now filters out prefixes not 'owned' by the ontology. Meant to do this earlier.
    - Misc: (i) minor refactoring, (ii) Removed completed todo.
    - Update: Make target for merged file now also makes a merged summary file.

    Misc
    - Update: .gitignore: Added exceptions reports/: (i) README.md, (ii) problematic exclusion xref tables.
    - Update: requirements*.txt: Added package: curies
    - Update: Signature file(s)
    - Bugfix: Incorrect prefix in config/doid.yml
commit 654d3fd57a6f89d9e1c2070d08c263f57714bc3a
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Update: Heavy rebase from main.
    - Update: Design: Heavy refactor:  'all ontologies' -> 'single ontology, then merge'
    - uri_to_curie: Refactored to a combo of my code plus OAK's after all 3 failed: my code by itself, oaklib, bioregistry.
    - Added labels to output
commit 8a03f44cf43d0f3a3fd39eee2cf88608df119ead
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Update: makefile: (i)Added `python-install-dependencies` and utilized this in several make goals, (ii) now passing more paths explicitly, (iii) moved from .PHONY targets to real ones.
    - Update: Changed from package architecture to a single file, as I am no longer storing cache/ in a diretory relative to the script, but rather in $(TMPDIR).
    General
    - Update: Python requirements: (i) requirements.txt: updated several to try and resolve dependency version conflicts, (ii) requirements-unlocked.txt: Added `oaklib`, `pyyaml`.
    - Update: cli_validate(): Removed a validation no longer needed, and added a new one.
    - Update: Outputs are now TSV instead of CSV.
    - Update: Silenced some false positive pandas warnings.
commit bdf50fb4f89bfcd0adb0d1789433c4411a22f49f
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
    - Update: Changed directory for temp/ignored files
    General
    -.gitignore: Removed ignored/ in favor of tmp/
commit e65899c631cc27b1b611c832b5bc20c9e3bf3fed
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
commit b3a99516560f4bb2aeeb095e2fdd2412276700bb
    Feature: Analysis of 'Which terms outside the current term scopes are xrefed?'
    - Added new Python script / sub-package
    - Added make command
    General
    - src/analysis/: Created this and initialized w/ new feature: excluded_terms_in_mondo
    - .gitignore: Added ignored/ dir so that I can download and store some inputs here.

@joeflack4 joeflack4 marked this pull request as draft July 12, 2022 00:01
@joeflack4 joeflack4 requested a review from matentzn July 12, 2022 00:01
@joeflack4 joeflack4 self-assigned this Jul 12, 2022
@joeflack4 joeflack4 added the enhancement New feature or request label Jul 12, 2022
@joeflack4 joeflack4 linked an issue Jul 12, 2022 that may be closed by this pull request
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 5 times, most recently from facea2f to 80a5617 Compare July 12, 2022 01:34
.gitignore Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 2 times, most recently from f6cabe7 to 94edb4a Compare July 13, 2022 21:50
@joeflack4
Copy link
Contributor Author

joeflack4 commented Jul 13, 2022

Results

(Last updated: 2022/08/12)
@matentzn @nicolevasilevsky @sabrinatoro Hey guys, here are the results of my analysis.

Codebook

I added one to: src/ontology/reports/README.md

Outputs

Edit: These are old. Now look in the reports/ folder. These are now being committed directly.

By ontology

i. ICD10WHO: icd10who_excluded_terms_in_mondo_xrefs.zip
ii. ICD10CM: icd10cm_excluded_terms_in_mondo_xrefs.zip (missing labels; see: #65)
iii. ORDO: ordo_excluded_terms_in_mondo_xrefs - v2.zip
iv. OMIM: omim_excluded_terms_in_mondo_xrefs.zip
v. DOID: doid_excluded_terms_in_mondo_xrefs.zip
vi. NCIT: ncit_excluded_terms_in_mondo_xrefs.zip

Now merged and committed. Can find each of the files here: https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/reports/

All ontologies combined

excluded_terms_xrefed_in_mondo.zip

Now merged and committed: https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/reports/excluded_terms_in_mondo_xrefs.tsv

ontology n_in1_notIn2_in3 pct_in1_notIn2_in3__over_in1
ordo 624 0.0408
ncit 166 0.001
doid 14 0.001
icd10cm 0 0.0
icd10who 0 0.0
omim 0 0.0

@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 2 times, most recently from 68ddc49 to 5f54e8a Compare July 13, 2022 22:29
.gitignore Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
@matentzn
Copy link
Member

Before merging, can we take a look at the table with the actual results?

@joeflack4
Copy link
Contributor Author

joeflack4 commented Jul 14, 2022

@matentzn wrote:

Before merging, can we take a look at the table with the actual results?

Yep. It's not ready to merge yet though. But I already uploaded results. I accidentally wasn't showing term_id but it's added now. I just have to filter by these prefixes and then it should be ready to merge. Check this comment which includes the current link to the file: #35 (comment)

@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch from 7b7886d to 7c69f41 Compare July 14, 2022 22:02
@joeflack4
Copy link
Contributor Author

joeflack4 commented Jul 14, 2022

@matentzn For some reason a lot of your comments are showing 2-3 times. Is this a bug on GitHub's part or are you commenting first and then responding to my open review comments after? Ideally I like to have all comments related to code changes to be review comments so that they can be marked resolved.

@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch from 7c69f41 to ac2ff9e Compare July 14, 2022 23:49
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 5 times, most recently from 61fd2db to 7123a5e Compare July 26, 2022 21:15
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 2 times, most recently from cb4cd43 to d41cd2d Compare August 12, 2022 18:33
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch 3 times, most recently from 4b985e4 to cf77334 Compare August 12, 2022 20:14
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
# todo: the merged _summary.tsv has a column `filename` on the right. would be better if it was named `ontology` and was on the left, and was sorted by most->least terms.
$(REPORTDIR)/excluded_terms_in_mondo_xrefs.tsv $(REPORTDIR)/excluded_terms_in_mondo_xrefs_summary.tsv:: $(foreach n,$(ALL_COMPONENT_IDS), $(REPORTDIR)/$(n)_excluded_terms_in_mondo_xrefs.tsv)
awk '(NR == 1) || (FNR > 1)' $(REPORTDIR)/*_excluded_terms_in_mondo_xrefs.tsv > $@
@awk -v OFS='\t' 'NR == 1 { print $$0, "filename" } FNR > 1 { print $$0, FILENAME }' \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know combine the summary statistics too. I'm not great at awk and similar unix-foo, though.

Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job. Need to pring up at next call!

reports/*_excluded_terms_in_mondo_xrefs_summary.tsv \
> $(REPORTDIR)/excluded_terms_in_mondo_xrefs_summary.tsv

excluded-xrefs-in-mondo: $(REPORTDIR)/excluded_terms_in_mondo_xrefs.tsv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New .PHONY target for convenience

… xrefed?'

- Added new Python script / sub-package
- Added make command

src/analysis/
- Created this and initialized w/ new feature: excluded_terms_in_mondo

.gitignore
- Added ignored/ dir so that I can download and store some inputs here.
… xrefed?'

  - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
… xrefed?'

  - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
  - Update: Changed directory for temp/ignored files

-.gitignore: Removed ignored/ in favor of tmp/
… xrefed?'

- Added new Python script / sub-package
- Added make command

src/analysis/
- Created this and initialized w/ new feature: excluded_terms_in_mondo

.gitignore
- Added ignored/ dir so that I can download and store some inputs here.
… xrefed?'

  - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
  - Update: Changed directory for temp/ignored files

-.gitignore: Removed ignored/ in favor of tmp/
… xrefed?'

- Update: makefile: Added `python-install-dependencies` and utilized this in several make goals.
… xrefed?'

- Added new Python script / sub-package
- Added make command

src/analysis/
- Created this and initialized w/ new feature: excluded_terms_in_mondo

.gitignore
- Added ignored/ dir so that I can download and store some inputs here.
… xrefed?'

  - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
  - Update: Changed directory for temp/ignored files

-.gitignore: Removed ignored/ in favor of tmp/
… xrefed?'

- Added new Python script / sub-package
- Added make command

src/analysis/
- Created this and initialized w/ new feature: excluded_terms_in_mondo

.gitignore
- Added ignored/ dir so that I can download and store some inputs here.
… xrefed?'

  - Bugfix: Added missing term_id to 'excluded_terms_xrefed_in_mondo.tsv'
  - Update: Changed directory for temp/ignored files

-.gitignore: Removed ignored/ in favor of tmp/
… xrefed?'

- Update: makefile: (i)Added `python-install-dependencies` and utilized this in several make goals, (ii) now passing more paths explicitly, (iii) moved from .PHONY targets to real ones.
- Update: Changed from package architecture to a single file, as I am no longer storing cache/ in a diretory relative to the script, but rather in $(TMPDIR).
General
- Update: Python requirements: (i) requirements.txt: updated several to try and resolve dependency version conflicts, (ii) requirements-unlocked.txt: Added `oaklib`, `pyyaml`.
- Update: cli_validate(): Removed a validation no longer needed, and added a new one.
- Update: Outputs are now TSV instead of CSV.
- Update: Silenced some false positive pandas warnings.
… xrefed?'

- Update: Heavy rebase from main.
- Update: Design: Heavy refactor:  'all ontologies' -> 'single ontology, then merge'
… xrefed?'

- Refactor: Continued
- uri_to_curie: Refactored to a combo of my code plus OAK's after all 3 failed: my code by itself, oaklib, bioregistry.
- Added labels to output
… xrefed?'

- Add: README.md to src/ontology/reports: Describes files.
- Add: Target to combine each of these 'by ontology' reports.
- Update: The 'summary' file for each ontology generated is now part of the target.
- Delete: Anything to do with 'excluded terms in Mondo classes'. No longer in scope for this feature.
- Delete: Anything to do with 'exclusion table' input. Granularity of using the component and mirror
 signature files is necessary.
- Delete: Unused functions replaced by package 'curies'
- Update: Raise exception when SPARQL returns no results.
- Update: Utilizing package: curies
- Update: Misc: (i) Simplified things, removing parts of code that were not necessary or useful, (ii) Updated some code comments, (iii) Minor touch-ups to other parts of makefile.
- Update: doid.yml config: Added a prefix
- Add: Committed problematic exclusion xref tables to reports/.
- Update: Now filters to show only problematic exclusions, e.g. leaves only rows where `n_in1_notIn2_in3` is True.
- Bugfix: Now filters out prefixes not 'owned' by the ontology. Meant to do this earlier.
- Misc: (i) minor refactoring, (ii) Removed completed todo.
- Update: Make target for merged file now also makes a merged summary file.

Misc
- Update: .gitignore: Added exceptions reports/: (i) README.md, (ii) problematic exclusion xref tables.
- Update: requirements*.txt: Added package: curies
- Update: Signature file(s)
- Bugfix: Incorrect prefix in config/doid.yml
… xrefed?'

- Add: Makefile: Added .PHONY target alias for convenience for feature
@joeflack4 joeflack4 force-pushed the xrefed-terms-outside-relevant-scope branch from db148f0 to ba140b9 Compare August 17, 2022 18:51
@joeflack4 joeflack4 merged commit 450fabc into main Aug 17, 2022
@joeflack4 joeflack4 deleted the xrefed-terms-outside-relevant-scope branch August 17, 2022 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Which terms outside the current term scopes are xref'd?
2 participants