Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error computing M-S #200

Open
cchriste opened this issue Dec 14, 2020 · 2 comments
Open

Error computing M-S #200

cchriste opened this issue Dec 14, 2020 · 2 comments
Assignees

Comments

@cchriste
Copy link
Contributor

cchriste commented Dec 14, 2020

Partially resolved with Ross, but M-S still contains erroneous results.
Debug code added to NNMSComplex.h that prints min/max for each sample as partitions are consolidated.
Actual errors have been identified and we can investigate this using nano-500 dataset.
More details forthcoming, though we could dive into this immediately since debug code reveals errors.

This is the underlying reason display of extrema (#114) is still incorrect and therefore disabled, but even non-extrema samples are incorrect.

@cchriste
Copy link
Contributor Author

Screenshots showing dreaded "Sample 28", including Hamming distance matrix 2d embedding clearly indicating how far from everything else this sample is. It was explicitly removed and other samples came out showing the same issue for different reasons.
Screen Shot 2020-12-09 at 8 22 41 PM
Screen Shot 2020-12-09 at 8 23 59 PM
Screen Shot 2020-12-09 at 8 24 04 PM
Screen Shot 2020-12-09 at 8 24 31 PM
Screen Shot 2020-12-09 at 11 38 29 PM

@cchriste
Copy link
Contributor Author

Current dataproc branch prints M-S computation debug info after each mergePersistence call. Results match drawer.

The debugging output in NNMSComplex::mergePersistence is currently enabled at line 222 of NNMSComplex.h.
Line 652 prints the same information at the end of NNMSComplex::runMS() just before the first mergePersistence(0) call and can be used to verify the results of the first merge are identical to the input data.

Using the dataset np500.2 (https://drive.google.com/file/d/1ljHrlaHR9C53uz35EIiN4Iz0l6dDoLOj/view?usp=sharing).
Seems that at least the extrema are incorrect. For example, when down to four crystals, sample 527 has extrema 343, but all the sightings of that sample much earlier in the persistences show the same maxima.

Not sure and out of battery (managed to forget it, which I guess is good). Enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants