-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QC check for: Set of Mondo terms in mondo.owl
& mondo.sssom.tsv
differ
#384
Comments
See my comment here: #363 (comment) |
Copy/pasting our discussion from aforementioned comment. Joe wrote:
Nico wrote:
|
mondo.owl
and mondo.sssom.tsv
out of syncmondo.owl
& mondo.sssom.tsv
differ
@matentzn If you want we can discuss at meeting instead of here. Decided to move conversation here, as (@matentzn correct me if I'm wrong) it sounds like your answers to questions (1) and (2) are 'yes' and 'yes'. Basically, there I can add a QC check and do some breakdown by source, and that should be done in a separate PR. Some more follow-up questions:
I attached out_of_sync_mondo_ids.tsv.zip yesterday. IDK if you didn't see it, or are just saying you prefer a Google Sheets, or it is possible you noticed it was erroneous! I had it correct yesterday, then tweaked the logic and got bad results. I just fixed that now and re-uploaded. Here's also the Google Sheets version for you. I see you are saying this difference between the files may not necessarily be problematic. So I just updated the title of the issue. FYI, OAK by default, when using the |
No I do a lot of my work on the phone and would like to be able to see things immediately rather than having to unzip etc to safe time.
There seems to be a real issue: I checked for example "MONDO:0029465" and it absolutely should be in both! No idea what could cause it to not be in one of the other. Is the mondo.owl that is at the bottom of mondo.db the latest release version?
Nothing in mind yet!
I would first make sure we have an issue that clearly defines the QC check. Then a separate PR.
TMD. I would like to ensure that the pipeline breaks if major issues occur, but I neither know what I mean by "major issues" exactly, nor how to recognise these; at least not without thinking. Lets develop some ideas and discuss at a call. |
Understood on Google Sheets now! Ok, we'll discuss how to implement these checks at the meeting. As for MONDO:0029465 I just discovered that my Edit: Good news! MONDO:0029465 no longer appears in the report. Also great news--now there are 0 entries that are "in SSSOM but not in |
Awesome, I randomly checked 10, can you remove "obsolete classes" from this list as well? all 10 were obsolete. |
No problem! I intentionally included the obsoletes to be most thorough. But especially now since the problem is no longer bidirectional, I suppose there is no longer any need. I just updated: out_of_sync_mondo_ids Google Sheet I also added a labels column. If you want the obsoletes back, now that I have added the labels, let me know. I could also add an |
Fantastic. In this case, I checked 10 randomly, and they all have no mappings associated with them, so everything seems correct! THANKS for drilling in! |
mondo.owl
& mondo.sssom.tsv
differmondo.owl
& mondo.sssom.tsv
differ
Very good! I updated the issue / project tracker so that this issue is for a QC check and not a bug report. |
Overview
I was working on #363 (comment) today. I had just gotten my new computer, so I was finally able to build
mondo.db
, so I ran the goal for that as well asmondo.sssom.tsv
to make sure that both were up-to-date. I expected then that the list of terms would be the same in each (I also accounted for obsolete terms). But that is not the case.I found 971 terms that were in
mondo.db
(frommondo.owl
) that weren't inmondo.sssom.tsv
.I found 2,024 terms that were in
mondo.sssom.tsv
but not inmondo.db
.out_of_sync_mondo_ids.tsv.zip (Google Sheet)
Update:
A. Non-problematic: in ontology only
B. Problematic: in SSSOM only
The text was updated successfully, but these errors were encountered: