Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boomer selects suboptimal solution in simple 3-node problem #158

Open
cmungall opened this issue Feb 5, 2021 · 6 comments
Open

boomer selects suboptimal solution in simple 3-node problem #158

cmungall opened this issue Feb 5, 2021 · 6 comments

Comments

@cmungall
Copy link
Contributor

cmungall commented Feb 5, 2021

for text files see #157.

Given:

  1. Pr(A properSubClassOf C) = 0.99
  2. Pr(A equiv B) = 0.95
  3. Pr(B equiv C) = 0.95

(in each case, the only other possibility is siblingOf)

note each class is in a separate prefix space, so there is no penalty for equivalence between any

Solutions:

  • 1,2,3 : incoherent
  • 1,2 : .99 * .95 * (1-.95) = 0.04
  • 1,3 : .99 * .95 * (1-.95) = 0.04
  • 2,3 : .95 * .95 * (1-0.99) = 0.009
  • 1 : .99 * .05 * .05 = 0.0023
  • 2 : .01 * .95 * .05 = 0.000475
  • 3 : .01 * .95 * .05 = 0.000475
  • {} : .01 * .05 * .05 = 2.5e-05

boomer generally selects {1} depending on params, but never the optimal

I am pretty sure I have not made a typo - I put each class in its own ID space, so it is not avoiding 2 or 3 (which would happen if A/B/C were in the same ID space)

boomer -p prefixes.yaml -w 100 -r 1000 -t ptable.tsv --ontology logical.omn 
...
2021.02.05 09:23:19:376 [zio-def...] [INFO ] org.monarchinitiative.boomer.Main.program:49 - Most probable: 0.0024750000000000015
...
$ more output.txt 
A:1 SiblingOf B:1               0.05
B:1 SiblingOf C:1               0.05
A:1 ProperSubClassOf C:1        (most probable) 0.99
@cmungall
Copy link
Contributor Author

cmungall commented Feb 5, 2021

I can confirm it's not avoiding any collapses, as if I reduce the ptable to omit 1

ie

A:1	B:1	0.0	0.0	0.95	0.05
B:1	C:1	0.0	0.0	0.95	0.05
A:1	C:1	0.99	0.0	0.01	0.0

then it correctly finds

B:1 EquivalentTo C:1    (most probable) 0.95
A:1 EquivalentTo B:1    (most probable) 0.95

@balhoff
Copy link
Member

balhoff commented Feb 5, 2021

I think the issue here is the high number of "windows" requested (100). Input rows are sorted according to their best probability, then the list of rows is chunked into the given number of windows. Across each independent run, shuffling occurs within each window, but the windows stay in the same total order. So it will always first add A ProperSubClassOf C. If you use a window value of 1, the rows are completely randomized and it is able to find the best solution.

@balhoff
Copy link
Member

balhoff commented Feb 5, 2021

See the logging at the beginning of a run (with 100 windows requested):

2021.02.05 14:32:54:070 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:30 - Bin size: 1; Most probable: 0.99
2021.02.05 14:32:54:091 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:30 - Bin size: 2; Most probable: 0.95
2021.02.05 14:32:54:095 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:33 - Max possible joint probability: -0.11263692462860261

The axioms in the first bin will always be added before proceeding to the next bin. Different runs will just shuffle the order of the two items in the second bin.

@cmungall
Copy link
Contributor Author

cmungall commented Feb 6, 2021

my ticket is in error... more later

@balhoff
Copy link
Member

balhoff commented Apr 30, 2021

I think we cleared this up. "windows" may not be as obvious as they ought to be but I think the UI will continue to evolve.

@balhoff balhoff closed this as completed Apr 30, 2021
@cmungall cmungall reopened this Jan 31, 2023
@cmungall
Copy link
Contributor Author

still an issue

A:1	B:1	0.0	0.0	0.95	0.05
B:1	C:1	0.0	0.0	0.95	0.05
A:1	C:1	0.99	0.0	0.01	0.0

running
boomer -t triangle.ptable.tsv -a triangle.owl -p prefixes.yaml -r 500 -w 1 -e 200 --output-internal-axioms true

yields

## SINGLETONS
Method: singletons
Score: -0.05129329438755058
Estimated probability: 1.0
Confidence: 1.0
Subsequent scores (max 10):

- [B:1](http://purl.obolibrary.org/obo/B_1) EquivalentTo [C:1](http://purl.obolibrary.org/obo/C_1)      (most probable) 0.95

and an incoherent output.ofn

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants