Extend PrepPipeline to allow performing PREP in stages #73

a-hurst · 2021-05-03T16:47:54Z

While chugging along on the PyPREP/MatPREP comparison unit tests, I had a thought about the PyPREP API: since there are more-or-less three distinct stages to the PREP pipeline (adaptive line noise removal, robust re-referencing, interpolation of remaining bad channels), what if the PrepPipeline object had separate methods for each of these? The fit() method would still exist, but just run all of these in sequence.

Separate methods would be handy for situations like my own, where I basically copy/pasted the guts of fit() into my own script in order to do progress messages and plotting between CleanLine and robust referencing. Also for my script the project lead wanted the option to ignore bad channels instead of interpolating them, so I had to write extra code to revert interpolation and the names of bad channels instead of having the option to not interpolate in the first place.

Here's my basic proposal for PrepPipeline methods:

fit: Performs the whole PREP pipeline, like ususal
remove_line_noise: Performs adaptive line noise removal.
robust_reference: Performs robust re-referencing without interpolation. Will default to printing a warning if remove_line_noise isn't done first.
interpolate_bads: Interpolates any remaining bad channels and adjusts the average reference accordingly.

Let me know what you think!

EDIT: With the addition of #96, a PR addressing this issue should also add MATLAB comparison tests for final bad channel interpolation.

The text was updated successfully, but these errors were encountered:

yjmantilla · 2021-05-03T17:50:56Z

I agree. Usually I dont do the remove line noise at all because the signals were already filtered.

sappelhoff · 2021-05-03T18:33:10Z

I think I forgot how the robust re referencing works ... I am surprised that it can work without interpolation. Can you explain?

Other than that, the api proposal sounds good to me. As long as we have good documentation an warnings in place to prevent users from accidentally screwing up their data, it should be fine.

yjmantilla · 2021-05-03T18:39:11Z

I understood that @a-hurst meant without the "final" interpolation. Not "without any interpolation at all". Now, I do wonder how it could be done that way:

The interpolation inside the robust referencing is always done, but it doesn't affect the real signal but a dummy-temporal one , that interpolation is only done to calculate the robust reference.

pyprep/pyprep/reference.py

Line 116 in 051edd6

dummy.interpolate_bads()

pyprep/pyprep/reference.py

Line 318 in 051edd6

raw_tmp.interpolate_bads()

What I get is that at the end of robust referencing PREP once again searches for bad channels but this time he interpolates it on the "real" signal . These channels then become the "interpolated channels":

pyprep/pyprep/reference.py

Line 153 in 051edd6

self.raw.interpolate_bads()

Then prep once again searches for bad channels and what he finds is named "still_noisy_channels".

a-hurst · 2021-05-03T19:45:50Z

I understood that @a-hurst meant without the "final" interpolation.

Yep, that's what I was referring to! As you said, during referencing NoisyChannels only ever does interpolation to make predictions during RANSAC. Then once it does its loops of NoisyChannels, it interpolates all 'bad' channels on a dummy copy of the data and uses that to calculate an average reference signal free of bad channels, and then re-references the data.

After that, NoisyChannels is run again, and any channels that are still flagged as 'bad' get interpolated. The signal is then average-referenced to account for the interpolated channels, and final run of NoisyChannels is used to detect any 'remaining bad' channels even after interpolating bads.

What I'm proposing is to have a method robust_reference that does the above up until interpolating bad channels post-rereference, and a separate interpolate_bads that does the interpolation and subsequent referencing/NoisyChannels detection, so it's possible to opt out of the extra steps and time required if it's not needed.

EDIT: Nevermind, I totally missed it! See the lines here:

pyprep/pyprep/reference.py

Lines 315 to 325 in 051edd6

    
           if self.noisy_channels["bad_all"]: 
        
               raw_tmp._data = signal * 1e-6 
        
               raw_tmp.info["bads"] = self.noisy_channels["bad_all"] 
        
               raw_tmp.interpolate_bads() 
        
               signal_tmp = raw_tmp.get_data() * 1e6 
        
           else: 
        
               signal_tmp = signal 
        
           self.reference_signal = ( 
        
               np.nanmean(raw_tmp.get_data(picks=self.reference_channels), axis=0) 
        
               * 1e6 
        
           )

I guess that means reference signal calculation is going to depend on MNE's interpolation code being identical to EEGLAB's (which, confusingly, uses a different function than interpolation does during RANSAC). Still, that doesn't change that initial re-referencing and final interpolation can be done separately (which is what I originally meant).

a-hurst · 2021-06-30T02:27:09Z

@sappelhoff @yjmantilla So I started taking a crack at this thinking it might simplify #95 a bit, but I've run into a couple snags I'd like to get your thoughts on.

For one, a lot of the pipeline attributes are named in a way that assumes interpolation will always happen:

pyprep/pyprep/prep_pipeline.py

Lines 226 to 237 in 1c9589c

    
           self.noisy_channels_before_interpolation = ( 
        
               reference.noisy_channels_before_interpolation 
        
           ) 
        
           self.noisy_channels_after_interpolation = ( 
        
               reference.noisy_channels_after_interpolation 
        
           ) 
        
           self.bad_before_interpolation = reference.bad_before_interpolation 
        
           self.EEG_before_interpolation = reference.EEG_before_interpolation 
        
           self.reference_before_interpolation = reference.reference_signal 
        
           self.reference_after_interpolation = reference.reference_signal_new 
        
           self.interpolated_channels = reference.interpolated_channels 
        
           self.still_noisy_channels = reference.still_noisy_channels

With final interpolation of bads being made optional, should all the before_interpolation attributes be renamed along the lines of after_reference or something? That would be the most API-breaking change that's been made so far since they're properties existing scripts are likely to use, but the existing naming seems confusing if interpolation doesn't necessarily happen. What do you think?

Relatedly, I'm trying to decide whether it makes sense for interpolate_bads to be its own method, instead of just having a interpolate_bads argument to PrepPipeline.robust_reference. The code would be simpler with it all in one method, but is there any use-case you can think of where you'd want to robust reference, do something else with the data, and then interpolate bad channels afterwards?

sappelhoff · 2021-06-30T06:50:50Z

I think that the upcoming release should be the one where we go wild in terms of API changes, with the hopes of having a much more stable API after 0.4.0 ... and eventually a fixed API (with deprecation cycles if necessary) starting at 1.0.0 --> so I agree with renaming attributes that don't make sense semantically (any more).

Re: Your second question I cannot think of a use case right now 🤔 if a user really wanted to do that wouldn't they be able to pass interpolate_bads=False and then later interpolate "manually" using the MNE-Python API and the information they can extract manually from the pyprep objects?

yjmantilla · 2021-07-01T00:01:22Z

With final interpolation of bads being made optional, should all the before_interpolation attributes be renamed along the lines of after_reference or something?

Regarding the renaming to "reference"... Yeap, I dont see any more proper name, after all the reference does indeed split those two "before" and "after" objects regardless the interpolation being done or not.

That would be the most API-breaking change that's been made so far since they're properties existing scripts are likely to use, but the existing naming seems confusing if interpolation doesn't necessarily happen. What do you think?

I dont know if this is too hacky or confusing but what about using a decorator to redirect EEG_X_interpolation to EEG_X_reference (the later would be the one in the code). It would be similar to what was done here:

pyprep/pyprep/prep_pipeline.py

Lines 168 to 175 in 1c9589c

    
           @property 
        
           def raw(self): 
        
               """Return a version of self.raw_eeg that includes the non-eeg channels.""" 
        
               full_raw = self.raw_eeg.copy() 
        
               if self.raw_non_eeg is None: 
        
                   return full_raw 
        
               else: 
        
                   return full_raw.add_channels([self.raw_non_eeg])

That way we wouldn't break the previous notation... Downside is that it may be confusing (but could be just clarified in the docstring).

Relatedly, I'm trying to decide whether it makes sense for interpolate_bads to be its own method, instead of just having a interpolate_bads argument to PrepPipeline.robust_reference. The code would be simpler with it all in one method, but is there any use-case you can think of where you'd want to robust reference, do something else with the data, and then interpolate bad channels afterwards?

if a user really wanted to do that wouldn't they be able to pass interpolate_bads=False and then later interpolate "manually" using the MNE-Python API and the information they can extract manually from the pyprep objects?

I agree with @sappelhoff , in that case he would pass false and then interpolate with his preferred way.

Now, I understand dummy.interpolate() in the following code is just mne stuff, so maybe we can expose the _eeglab_interpolate_bads ; I mean, this function is what diverges from MNE interpolation so in a way is the only thing I see we could provide as a "different interpolation" from what is already on MNE. Nevertheless , this would be a too technical use-case and an hypothetical user that needs it wouldn't have trouble with just using the private function. So, I think overall is not necessary to make interpolated_bads a method.

pyprep/pyprep/reference.py

Lines 151 to 154 in 1c9589c

    
           if self.matlab_strict: 
        
               _eeglab_interpolate_bads(self.raw) 
        
           else: 
        
               self.raw.interpolate_bads()

a-hurst · 2021-07-01T02:53:12Z

@sappelhoff @yjmantilla Okay, I've got a rough draft for reorganized/renamed attributes for the PrepPipeline object, am hoping to get your thoughts before I go and write docstrings or anything:

        # NOTE: 'original' refers to before initial average reference, not first
        # pass afterwards. Not necessarily comparable to later values?
        self.noisy_info = {
            "original": None, "post-reference": None, "post-interpolation": None
        }
        self.bad_channels = {
            "original": None, "post-reference": None, "post-interpolation": None
        }
        self.reference_signals = {"robust": None, "post-interpolation": None}

        self.robust_reference_signal = None
        self._interpolated_reference_signal = None

    @property
    def current_noisy_info(self):
        post_ref = self.noisy_info["post-reference"]
        post_interp = self.noisy_info["post-interpolation"]
        return post_interp if post_interp else post_ref

    @property
    def remaining_bad_channels(self):
        post_ref = self.bad_channels["post-reference"]
        post_interp = self.bad_channels["post-interpolation"]
        return post_interp if post_interp else post_ref

    @property
    def current_reference_signal(self):
        post_ref = self.robust_reference_signal
        post_interp = self._interpolated_reference_signal
        return post_interp if post_interp else post_ref

Basically I've grouped all the bad channel lists & noisy channels dicts into dictionaries with keys for each major timepoint, with the values being None if that part of the pipeline hasn't been run. I've also added some getter methods to get the info for the pipeline's current state, which will fetch the post-interpolation values if interpolation was used, or the post-reference values otherwise. Does this seem like a good approach? Or is it a big and largely unnecessary change when it's just the post-reference variable names that are a problem?

I dont know if this is too hacky or confusing but what about using a decorator to redirect EEG_X_interpolation to EEG_X_reference (the later would be the one in the code). It would be similar to what was done here:

I was thinking about that: it's definitely possible (including with the above reorganization), I'm just not sure whether the project has enough users at this stage to warrant properly deprecating methods/attributes instead of just dropping them. Do we have any PyPi download stats that would give us an idea?

Now, I understand dummy.interpolate() in the following code is just mne stuff, so maybe we can expose the _eeglab_interpolate_bads ; I mean, this function is what diverges from MNE interpolation so in a way is the only thing I see we could provide as a "different interpolation" from what is already on MNE. Nevertheless , this would be a too technical use-case and an hypothetical user that needs it wouldn't have trouble with just using the private function. So, I think overall is not necessary to make interpolated_bads a method.

In the Reference class itself I've made interpolate_bads a separate method, so I guess if someone really has a use-case where they want to re-reference and interpolate separately they can do it that way.

sappelhoff · 2021-07-01T07:54:54Z

Do we have any PyPi download stats that would give us an idea?

https://pypistats.org/packages/pyprep

But any person who reasonably uses pyprep currently is not installing it via pip, I think - but rather directly from GitHub.
Also, I have had the experience that the pypi stats are often inflated by bots, CI runs, mirrors, etc. and it's hard to get clean data on human users that actually use your package. I think that GitHub stars/forks/watchings and number of raised issues (by different people) are better indicators.

I'm just not sure whether the project has enough users at this stage to warrant properly deprecating methods/attributes instead of just dropping them.

IMHO we should not properly deprecate ... let's do what we think is best without looking back - for now. We have a prominent warning in the README that this will be the case (alpha software).

Regarding the proposed getter methods: I think as a user I'd like to have access to all data, and not "being served a part of the data based on the state my object is in", while the other data is present, but hidden from me. Or at least, I'd like to have control over what data I can get (with reasonable defaults so that I don't have to give it much thought).

a-hurst · 2021-07-02T16:42:11Z

Regarding the proposed getter methods: I think as a user I'd like to have access to all data, and not "being served a part of the data based on the state my object is in", while the other data is present, but hidden from me. Or at least, I'd like to have control over what data I can get (with reasonable defaults so that I don't have to give it much thought).

I was thinking that the self.noisy_info and self.bad_channels dicts would be proper documented attributes as well, so that users could access the NoisyChannels dicts and bad channel names at each stage directly. The "getter" methods here are just so that there would be a consistent interface for getting post-reference info regardless of whether interpolating bad channels was enabled or disabled (allows for writing code where the option is easily toggled on/off).

If you think that separate attributes for the original/post-reference/post-interpolation info makes for a better API than the dict approach though I'll change it accordingly 🙂

sappelhoff · 2021-07-02T19:10:04Z

ahh I see, makes sense - thanks for the explanation :)

sappelhoff added this to the 0.5.0 milestone May 18, 2021

a-hurst mentioned this issue Jun 6, 2021

pyprep needs to be split into smaller, testable chunks #23

Open

a-hurst mentioned this issue Jun 16, 2021

Interpolation of bad channels differs numerically between PyPREP and MatPREP #77

Closed

This was referenced Jun 28, 2021

Implement EEGLAB-style interpolation in Reference for matlab_strict #96

Merged

Adaptive line noise removal differs between PyPREP and Matlab PREP #82

Open

a-hurst linked a pull request Jul 3, 2021 that will close this issue

[WIP] Split PrepPipeline into separate methods, make final interpolation optional #99

Open

4 tasks

sappelhoff mentioned this issue Aug 2, 2021

Release 0.4 #101

Closed

sappelhoff mentioned this issue Aug 31, 2024

Make the entire Reference class private #158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend PrepPipeline to allow performing PREP in stages #73

Extend PrepPipeline to allow performing PREP in stages #73

a-hurst commented May 3, 2021 •

edited

Loading

yjmantilla commented May 3, 2021

sappelhoff commented May 3, 2021

yjmantilla commented May 3, 2021 •

edited

Loading

a-hurst commented May 3, 2021 •

edited

Loading

a-hurst commented Jun 30, 2021 •

edited

Loading

sappelhoff commented Jun 30, 2021

yjmantilla commented Jul 1, 2021 •

edited

Loading

a-hurst commented Jul 1, 2021

sappelhoff commented Jul 1, 2021

a-hurst commented Jul 2, 2021

sappelhoff commented Jul 2, 2021

Extend PrepPipeline to allow performing PREP in stages #73

Extend PrepPipeline to allow performing PREP in stages #73

Comments

a-hurst commented May 3, 2021 • edited Loading

yjmantilla commented May 3, 2021

sappelhoff commented May 3, 2021

yjmantilla commented May 3, 2021 • edited Loading

a-hurst commented May 3, 2021 • edited Loading

a-hurst commented Jun 30, 2021 • edited Loading

sappelhoff commented Jun 30, 2021

yjmantilla commented Jul 1, 2021 • edited Loading

a-hurst commented Jul 1, 2021

sappelhoff commented Jul 1, 2021

a-hurst commented Jul 2, 2021

sappelhoff commented Jul 2, 2021

a-hurst commented May 3, 2021 •

edited

Loading

yjmantilla commented May 3, 2021 •

edited

Loading

a-hurst commented May 3, 2021 •

edited

Loading

a-hurst commented Jun 30, 2021 •

edited

Loading

yjmantilla commented Jul 1, 2021 •

edited

Loading