-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare RANSAC in pyprep and autoreject #36
Comments
the ransac implementation in autoreject is mostly there because it's what was used in the paper for comparison. Thus for reproducibility purposes. I haven't tried to optimize it in any way so I think the pyprep implementation is probably better. We did however check that it gives the same result as Matlab (as long as you're willing to do the median computation incorrectly which I pointed out elsewhere) |
@sappelhoff @jasmainak I've written a script for investigating #41 that would be well-suited for comparing the two RANSAC implementations, but based on the autoreject RANSAC source it looks like it's only designed for use with epochs, making them difficult to compare. Is there an easy tweak you can think of that would make it run RANSAC on the whole file like pyprep's implementation? |
it's been 3-4 years since I looked at the code, so not sure I can suggest a quick hack. Maybe use this function: https://mne.tools/stable/generated/mne.make_fixed_length_epochs.html#mne.make_fixed_length_epochs ? |
Ok, im finally testing this. I have merged the autoreject ransac inside pyprep, Im not sure I did it completely correctly but it seems to be working. The only problem is that I can only test files that dont crash my ram. Since Im essentially running autoreject ransac like there was a single epoch then it kind of is like @sappelhoff ransac before the optimization. I wonder what would happen if we did the ransac epoch wise. That is, just chunk the continuous file with as many epochs as the user needs for it to not collapse the ram. Essentially it would be like the splitting mechanism we have on pyprep's ransac. That way I could test the heavier files (and in particular sub-007 of #41 ). If autoreject's ransac has an optimization along the epochs so that it doesn't consume too much ram at a time that might work just alright (?)
BTW looked at @jasmainak suggestion but I ended up using EDIT: Now, also testing the epoch wise way. |
Here is my proposal to implement autoreject's ransac, montage needs to be carried from prep_pipeline into Reference and then into the NoisyChannels class.: #imports
from autoreject import Ransac # noqa
from autoreject.utils import interpolate_bads # noqa
def find_bad_by_ransac(
self,
n_samples=50,
fraction_good=0.25,
corr_thresh=0.75,
fraction_bad=0.4,
corr_window_secs=5.0,
channel_wise=False,
mode = 'autoreject',
njobs = 1,
):
"""Detect channels that are not predicted well by other channels.
Here, a ransac approach (see [1]_, and a short discussion in [2]_) is
adopted to predict a "clean EEG" dataset. After identifying clean EEG
channels through the other methods, the clean EEG dataset is
constructed by repeatedly sampling a small subset of clean EEG channels
and interpolation the complete data. The median of all those
repetitions forms the clean EEG dataset. In a second step, the original
and the ransac predicted data are correlated and channels, which do not
correlate well with themselves across the two datasets are considered
`bad_by_ransac`.
Parameters
----------
n_samples : int
Number of samples used for computation of ransac.
fraction_good : float
Fraction of channels used for robust reconstruction of the signal.
This needs to be in the range [0, 1], where obviously neither 0
nor 1 would make sense.
corr_thresh : float
The minimum correlation threshold that should be attained within a
data window.
fraction_bad : float
If this percentage of all data windows in which the correlation
threshold was not surpassed is exceeded, classify a
channel as `bad_by_ransac`.
corr_window_secs : float
Size of the correlation window in seconds.
mode : str
Implementation of ransac. 'autoreject' | 'pyprep'
njobs : int
Number of jobs, only used for autoreject ransac.
References
----------
.. [1] Fischler, M.A., Bolles, R.C. (1981). Random rample consensus: A
Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography. Communications of the ACM, 24, 381-395
.. [2] Jas, M., Engemann, D.A., Bekhti, Y., Raimondo, F., Gramfort, A.
(2017). Autoreject: Automated Artifact Rejection for MEG and EEG
Data. NeuroImage, 159, 417-429
"""
# First, identify all bad channels by other means:
bads = self.get_bads()
# Get all channel positions and the position subset of "clean channels"
good_idx = mne.pick_channels(list(self.ch_names_new), include=[], exclude=bads)
good_chn_labs = self.ch_names_new[good_idx]
n_chans_good = good_idx.shape[0]
chn_pos = self.raw_mne._get_channel_positions()
chn_pos_good = chn_pos[good_idx, :]
############AUTO REJECT####################
if mode == 'autoreject':
ransac = Ransac(picks=good_idx,n_resample=n_samples, min_channels=fraction_good, min_corr=corr_thresh, unbroken_time=fraction_bad, n_jobs=njobs, random_state=435656, verbose= 'tqdm')
info = mne.create_info(self.ch_names_new.tolist(), self.sample_rate, ch_types='eeg', verbose=None)
# Option 1: Single Epoch
#EEGData_epoch = self.EEGData[np.newaxis, :, :]
#epochs = mne.EpochsArray(EEGData_epoch,info)
# Option 2: Epochs of corr_window_secs seconds as windows
raw = mne.io.RawArray(self.EEGData,info)
epochs = mne.make_fixed_length_epochs(raw, duration=corr_window_secs, preload=True,reject_by_annotation=False, verbose=None)
# Either way...
epochs.set_montage(self.montage)
epochs_clean = ransac.fit(epochs)
#print(epochs_clean.bad_chs_)
self.bad_by_ransac = [i for i in epochs_clean.bad_chs_]
return None
############AUTO REJECT####################
else:
# Check if we have enough remaining channels
# after exclusion of bad channels
n_pred_chns = int(np.ceil(fraction_good * n_chans_good))
# Pick a subset of clean channels for reconstruction
rng = check_random_state(self.random_state)
CODE PROCEDES AS IS
``` |
nice :-) instead of weaving this into pyprep directly, I think I would recommend you package this in an example, like the existing ones here: https://github.com/sappelhoff/pyprep/tree/master/examples
that sounds interesting. How does Matlab PREP prevent these huge RAM and/or speed impacts that we observe? one more hint - When using markdown code blocks, you can often switch on syntax highlighting like so:
you can see it in your code snippet above, which I edited to add the "python" syntax "command". |
@sappelhoff Hmm guess I was inspired by comparing the overall prep output when using each method (hint, it changes, autoreject does seem closer to matlab at least for sub 007). Also I sort of thought that in that way I knew the signal was exactly the same. For the "example" comparison I guess the pipe would be:
And then get it inside directly to the find_bad_by_ransac. The problem is that since the Noisy Class has all sorts of dependences between the criteria (before ransac it uses all the other criteria) then for it to be a fair comparison we would need to overcome that. For example the self.new_chans (or something like that) is produced by the output of the other methods. This is kind of why I like to use pure functions for signal processing stuff 😄 In that sense I guess we would need to copy the functions into the example, adapt them and then we can compare fairly. |
Regarding the matlab code, well I remember reading it and being really confused. There is like this "hlp_microcache" function that seems to optimize. I would need to reread it to remember how it worked. But one thing I do know is that I have never had ram problems with it. Maybe it ain't the fastest though, I'm not sure. |
I also wish pyprep were a set of nice functions instead of a huge mess of objects, methods, and their inter-dependencies. :( I don't really know how to solve the current issue without first refactoring / rewriting large parts of pyprep. Perhaps creating a new sub-package within pyprep, where everything is function based and we add piece by piece? ... and then once it's ready, we deprecate the object based pyprep part. |
Ok so given the new PRs and so that we dont make this a huge mess I suggest:
@sappelhoff @a-hurst you guys agree? Im not sure how the whole subpackage thing works. But I guess we can "functionalize" the NoisyChannels class part by part first and just call the functions within the class. This roadmap just focus on the ransac because of the current issues. |
Sounds good to me as long as the new function based ransac then takes the best of autoreject and current pyprep and is put into a new module, or several modules |
I am pretty sure I checked that the matrices are exactly the same as in Matlab, at least for one dataset, if you use the same seed. |
Since the comparison I did was based on the whole pyprep output possibly the difference is then on the overall algorithm of the prep, that is , not specifically inside the ransac. When we arrive at the comparison of only the ransac (mainly the issue is that pyprep's ransac is not standalone) we will clarify the thing further. I wonder how did you manage to produce the same exact results as matlab given a seed... I mean I was trying that but it seems that some numpy functions will give the same result as the equivalent of matlab given the same seed (for example np.random.randn) while others wont (np.random.choice). Guess the functions you used did give the same output. I think we will probably end up using autoreject's given that the validation is exact already. PS: regarding the median calculation issue i would go with the corrected version even if matlab prep's disagrees. |
see the seed here: https://github.com/VisLab/EEG-Clean-Tools/blob/3ed337e83dfaaad2c8e3ecb671a44a21c5b288c0/PrepPipeline/utilities/findNoisyChannels.m#L333 I think I removed it in the final version after checking that it agrees if you place it in exactly the same location (so that the precise order of random numbers drawn are the same). Note that using |
another source of difference can be the interpolation function itself. Looking at the commits now, I remember that I made this pull request in MNE: https://github.com/mne-tools/mne-python/pull/2758/files to make it exactly comparable. If that function has changed by any chance. |
here's the diff of the pull request I made in a private repository: https://gist.github.com/jasmainak/d459dd128ded1e93b54050c7fab08e79. It has the script of reading the EEGLAB dataset I used to compare. Not sure if all these functions still work in MNE ... that's all I have to offer. Hope it's useful |
@jasmainak wow! Right now I cant review this deeply (university exams and stuff) but when we delve into the comparison itself and into solving this ransac issue once and for all im sure these will come in handy. Many thanks for your contribution and expertise. |
Thanks a lot @jasmainak :) |
As mentioned in #50, I decided to compare interpolation between AutoReject's I also noticed that the MNE interpolation code in @jasmainak's pull request had been changed a few months back so I tried subbing the version in the PR in for the current MNE version, but the results were identical between the two. |
I think it'd be safest to use the MNE interpolate_bads, not autoreject and not pyprep 🤔 |
Does Pyprep also work for MEG data? The reason for Autoreject copying over some of the MNE interpolation code is for efficiency. In the autoreject algorithm, you have to do a "leave one out interpolation" -- it becomes a lot more efficient with MEG data if you precompute certain things: I have been lazy to sync the code with MNE ... I believe there were some fixes/improvements since then. Any help would be very welcome :) |
Agreed, though if MNE's implementation differs from the MATLAB PREP's I guess we'll need write an equivalent interpolation function for the purposes of #50, #21, and the like. It looks like MATLAB PREP uses EEGLAB's |
not out of the box, I think - and the MATLAB PREP is also just for EEG.
I think both methods default to using the "spherical splines" approach by Perrin et al 1989, so broadly they should be equivalent. |
both the pyprep and the autoreject package have an implementation of RANSAC.
We should make a comparison to:
After the comparison we could decide to drop one implementation and only use the "better" one.
Or we keep using both implementations, but make an example that explicitly compares them, pointing out pros and cons in terms of features.
In any case, this would be an important enterprise for point 1 and 2 in my list alone.
The text was updated successfully, but these errors were encountered: