Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PrepPipeline to allow performing PREP in stages #73

Open
a-hurst opened this issue May 3, 2021 · 11 comments · May be fixed by #99
Open

Extend PrepPipeline to allow performing PREP in stages #73

a-hurst opened this issue May 3, 2021 · 11 comments · May be fixed by #99
Milestone

Comments

@a-hurst
Copy link
Collaborator

a-hurst commented May 3, 2021

While chugging along on the PyPREP/MatPREP comparison unit tests, I had a thought about the PyPREP API: since there are more-or-less three distinct stages to the PREP pipeline (adaptive line noise removal, robust re-referencing, interpolation of remaining bad channels), what if the PrepPipeline object had separate methods for each of these? The fit() method would still exist, but just run all of these in sequence.

Separate methods would be handy for situations like my own, where I basically copy/pasted the guts of fit() into my own script in order to do progress messages and plotting between CleanLine and robust referencing. Also for my script the project lead wanted the option to ignore bad channels instead of interpolating them, so I had to write extra code to revert interpolation and the names of bad channels instead of having the option to not interpolate in the first place.

Here's my basic proposal for PrepPipeline methods:

  • fit: Performs the whole PREP pipeline, like ususal
  • remove_line_noise: Performs adaptive line noise removal.
  • robust_reference: Performs robust re-referencing without interpolation. Will default to printing a warning if remove_line_noise isn't done first.
  • interpolate_bads: Interpolates any remaining bad channels and adjusts the average reference accordingly.

Let me know what you think!

EDIT: With the addition of #96, a PR addressing this issue should also add MATLAB comparison tests for final bad channel interpolation.

@yjmantilla
Copy link
Collaborator

I agree. Usually I dont do the remove line noise at all because the signals were already filtered.

@sappelhoff
Copy link
Owner

I think I forgot how the robust re referencing works ... I am surprised that it can work without interpolation. Can you explain?

Other than that, the api proposal sounds good to me. As long as we have good documentation an warnings in place to prevent users from accidentally screwing up their data, it should be fine.

@yjmantilla
Copy link
Collaborator

yjmantilla commented May 3, 2021

I understood that @a-hurst meant without the "final" interpolation. Not "without any interpolation at all". Now, I do wonder how it could be done that way:

The interpolation inside the robust referencing is always done, but it doesn't affect the real signal but a dummy-temporal one , that interpolation is only done to calculate the robust reference.

dummy.interpolate_bads()

raw_tmp.interpolate_bads()

What I get is that at the end of robust referencing PREP once again searches for bad channels but this time he interpolates it on the "real" signal . These channels then become the "interpolated channels":

self.raw.interpolate_bads()

Then prep once again searches for bad channels and what he finds is named "still_noisy_channels".

@a-hurst
Copy link
Collaborator Author

a-hurst commented May 3, 2021

I understood that @a-hurst meant without the "final" interpolation.

Yep, that's what I was referring to! As you said, during referencing NoisyChannels only ever does interpolation to make predictions during RANSAC. Then once it does its loops of NoisyChannels, it interpolates all 'bad' channels on a dummy copy of the data and uses that to calculate an average reference signal free of bad channels, and then re-references the data.

After that, NoisyChannels is run again, and any channels that are still flagged as 'bad' get interpolated. The signal is then average-referenced to account for the interpolated channels, and final run of NoisyChannels is used to detect any 'remaining bad' channels even after interpolating bads.

What I'm proposing is to have a method robust_reference that does the above up until interpolating bad channels post-rereference, and a separate interpolate_bads that does the interpolation and subsequent referencing/NoisyChannels detection, so it's possible to opt out of the extra steps and time required if it's not needed.

EDIT: Nevermind, I totally missed it! See the lines here:

pyprep/pyprep/reference.py

Lines 315 to 325 in 051edd6

if self.noisy_channels["bad_all"]:
raw_tmp._data = signal * 1e-6
raw_tmp.info["bads"] = self.noisy_channels["bad_all"]
raw_tmp.interpolate_bads()
signal_tmp = raw_tmp.get_data() * 1e6
else:
signal_tmp = signal
self.reference_signal = (
np.nanmean(raw_tmp.get_data(picks=self.reference_channels), axis=0)
* 1e6
)

I guess that means reference signal calculation is going to depend on MNE's interpolation code being identical to EEGLAB's (which, confusingly, uses a different function than interpolation does during RANSAC). Still, that doesn't change that initial re-referencing and final interpolation can be done separately (which is what I originally meant).

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jun 30, 2021

@sappelhoff @yjmantilla So I started taking a crack at this thinking it might simplify #95 a bit, but I've run into a couple snags I'd like to get your thoughts on.

For one, a lot of the pipeline attributes are named in a way that assumes interpolation will always happen:

self.noisy_channels_before_interpolation = (
reference.noisy_channels_before_interpolation
)
self.noisy_channels_after_interpolation = (
reference.noisy_channels_after_interpolation
)
self.bad_before_interpolation = reference.bad_before_interpolation
self.EEG_before_interpolation = reference.EEG_before_interpolation
self.reference_before_interpolation = reference.reference_signal
self.reference_after_interpolation = reference.reference_signal_new
self.interpolated_channels = reference.interpolated_channels
self.still_noisy_channels = reference.still_noisy_channels

With final interpolation of bads being made optional, should all the before_interpolation attributes be renamed along the lines of after_reference or something? That would be the most API-breaking change that's been made so far since they're properties existing scripts are likely to use, but the existing naming seems confusing if interpolation doesn't necessarily happen. What do you think?

Relatedly, I'm trying to decide whether it makes sense for interpolate_bads to be its own method, instead of just having a interpolate_bads argument to PrepPipeline.robust_reference. The code would be simpler with it all in one method, but is there any use-case you can think of where you'd want to robust reference, do something else with the data, and then interpolate bad channels afterwards?

@sappelhoff
Copy link
Owner

I think that the upcoming release should be the one where we go wild in terms of API changes, with the hopes of having a much more stable API after 0.4.0 ... and eventually a fixed API (with deprecation cycles if necessary) starting at 1.0.0 --> so I agree with renaming attributes that don't make sense semantically (any more).

Re: Your second question I cannot think of a use case right now 🤔 if a user really wanted to do that wouldn't they be able to pass interpolate_bads=False and then later interpolate "manually" using the MNE-Python API and the information they can extract manually from the pyprep objects?

@yjmantilla
Copy link
Collaborator

yjmantilla commented Jul 1, 2021

With final interpolation of bads being made optional, should all the before_interpolation attributes be renamed along the lines of after_reference or something?

Regarding the renaming to "reference"... Yeap, I dont see any more proper name, after all the reference does indeed split those two "before" and "after" objects regardless the interpolation being done or not.

That would be the most API-breaking change that's been made so far since they're properties existing scripts are likely to use, but the existing naming seems confusing if interpolation doesn't necessarily happen. What do you think?

I dont know if this is too hacky or confusing but what about using a decorator to redirect EEG_X_interpolation to EEG_X_reference (the later would be the one in the code). It would be similar to what was done here:

@property
def raw(self):
"""Return a version of self.raw_eeg that includes the non-eeg channels."""
full_raw = self.raw_eeg.copy()
if self.raw_non_eeg is None:
return full_raw
else:
return full_raw.add_channels([self.raw_non_eeg])

That way we wouldn't break the previous notation... Downside is that it may be confusing (but could be just clarified in the docstring).

Relatedly, I'm trying to decide whether it makes sense for interpolate_bads to be its own method, instead of just having a interpolate_bads argument to PrepPipeline.robust_reference. The code would be simpler with it all in one method, but is there any use-case you can think of where you'd want to robust reference, do something else with the data, and then interpolate bad channels afterwards?

if a user really wanted to do that wouldn't they be able to pass interpolate_bads=False and then later interpolate "manually" using the MNE-Python API and the information they can extract manually from the pyprep objects?

I agree with @sappelhoff , in that case he would pass false and then interpolate with his preferred way.

Now, I understand dummy.interpolate() in the following code is just mne stuff, so maybe we can expose the _eeglab_interpolate_bads ; I mean, this function is what diverges from MNE interpolation so in a way is the only thing I see we could provide as a "different interpolation" from what is already on MNE. Nevertheless , this would be a too technical use-case and an hypothetical user that needs it wouldn't have trouble with just using the private function. So, I think overall is not necessary to make interpolated_bads a method.

pyprep/pyprep/reference.py

Lines 151 to 154 in 1c9589c

if self.matlab_strict:
_eeglab_interpolate_bads(self.raw)
else:
self.raw.interpolate_bads()

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jul 1, 2021

@sappelhoff @yjmantilla Okay, I've got a rough draft for reorganized/renamed attributes for the PrepPipeline object, am hoping to get your thoughts before I go and write docstrings or anything:

        # NOTE: 'original' refers to before initial average reference, not first
        # pass afterwards. Not necessarily comparable to later values?
        self.noisy_info = {
            "original": None, "post-reference": None, "post-interpolation": None
        }
        self.bad_channels = {
            "original": None, "post-reference": None, "post-interpolation": None
        }
        self.reference_signals = {"robust": None, "post-interpolation": None}

        self.robust_reference_signal = None
        self._interpolated_reference_signal = None

    @property
    def current_noisy_info(self):
        post_ref = self.noisy_info["post-reference"]
        post_interp = self.noisy_info["post-interpolation"]
        return post_interp if post_interp else post_ref

    @property
    def remaining_bad_channels(self):
        post_ref = self.bad_channels["post-reference"]
        post_interp = self.bad_channels["post-interpolation"]
        return post_interp if post_interp else post_ref

    @property
    def current_reference_signal(self):
        post_ref = self.robust_reference_signal
        post_interp = self._interpolated_reference_signal
        return post_interp if post_interp else post_ref

Basically I've grouped all the bad channel lists & noisy channels dicts into dictionaries with keys for each major timepoint, with the values being None if that part of the pipeline hasn't been run. I've also added some getter methods to get the info for the pipeline's current state, which will fetch the post-interpolation values if interpolation was used, or the post-reference values otherwise. Does this seem like a good approach? Or is it a big and largely unnecessary change when it's just the post-reference variable names that are a problem?

I dont know if this is too hacky or confusing but what about using a decorator to redirect EEG_X_interpolation to EEG_X_reference (the later would be the one in the code). It would be similar to what was done here:

I was thinking about that: it's definitely possible (including with the above reorganization), I'm just not sure whether the project has enough users at this stage to warrant properly deprecating methods/attributes instead of just dropping them. Do we have any PyPi download stats that would give us an idea?

Now, I understand dummy.interpolate() in the following code is just mne stuff, so maybe we can expose the _eeglab_interpolate_bads ; I mean, this function is what diverges from MNE interpolation so in a way is the only thing I see we could provide as a "different interpolation" from what is already on MNE. Nevertheless , this would be a too technical use-case and an hypothetical user that needs it wouldn't have trouble with just using the private function. So, I think overall is not necessary to make interpolated_bads a method.

In the Reference class itself I've made interpolate_bads a separate method, so I guess if someone really has a use-case where they want to re-reference and interpolate separately they can do it that way.

@sappelhoff
Copy link
Owner

Do we have any PyPi download stats that would give us an idea?

https://pypistats.org/packages/pyprep

But any person who reasonably uses pyprep currently is not installing it via pip, I think - but rather directly from GitHub.
Also, I have had the experience that the pypi stats are often inflated by bots, CI runs, mirrors, etc. and it's hard to get clean data on human users that actually use your package. I think that GitHub stars/forks/watchings and number of raised issues (by different people) are better indicators.

I'm just not sure whether the project has enough users at this stage to warrant properly deprecating methods/attributes instead of just dropping them.

IMHO we should not properly deprecate ... let's do what we think is best without looking back - for now. We have a prominent warning in the README that this will be the case (alpha software).

Regarding the proposed getter methods: I think as a user I'd like to have access to all data, and not "being served a part of the data based on the state my object is in", while the other data is present, but hidden from me. Or at least, I'd like to have control over what data I can get (with reasonable defaults so that I don't have to give it much thought).

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jul 2, 2021

Regarding the proposed getter methods: I think as a user I'd like to have access to all data, and not "being served a part of the data based on the state my object is in", while the other data is present, but hidden from me. Or at least, I'd like to have control over what data I can get (with reasonable defaults so that I don't have to give it much thought).

I was thinking that the self.noisy_info and self.bad_channels dicts would be proper documented attributes as well, so that users could access the NoisyChannels dicts and bad channel names at each stage directly. The "getter" methods here are just so that there would be a consistent interface for getting post-reference info regardless of whether interpolating bad channels was enabled or disabled (allows for writing code where the option is easily toggled on/off).

If you think that separate attributes for the original/post-reference/post-interpolation info makes for a better API than the dict approach though I'll change it accordingly 🙂

@sappelhoff
Copy link
Owner

ahh I see, makes sense - thanks for the explanation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants