ENH: Add description filter parameter to Raw.crop_by_annotations()#13820
ENH: Add description filter parameter to Raw.crop_by_annotations()#13820aman-coder03 wants to merge 3 commits into
Raw.crop_by_annotations()#13820Conversation
|
Hey @aman-coder03, I tested your branch locally to see how it handles some edge cases. The base implementation looks good, but I found some things that might be worth refining to match with MNE standards I noticed you mentioned
|
|
thanks for the review, i investigated |
|
Thanks for the update, @aman-coder03! Adding that |
|
@aman-coder03 @PragnyaKhandelwal @drammock Yeah, there's some inconsistency whether annotation descriptions are matched exactly or not (for example, #13940). A common convention when creating annotations is using "BAD_" to denote parts of the data that ought to be excluded, while altering the rest of the string to indicate the reason for exclusion (e.g. "BAD_blink", "BAD_movement"). This naming convention means that a regex style ability to find all the annotations starting with "BAD_" is desirable. But spotty implementation of regex vs exact matching might cause problems. @drammock, Thoughts? |
Adds an optional `regexp` parameter to `Raw.crop_by_annotations()` so only annotations whose description matches the pattern are cropped (e.g. `regexp="^BAD_"`). Default `regexp=None` crops every annotation, preserving current behavior. Extracts the regex-matching core of `_select_annotations_based_on_description` into a shared `_match_descriptions` helper so cropping reuses the same matching as `events_from_annotations` without inheriting its `event_id` machinery. No-match emits a `RuntimeWarning`; the matcher itself stays policy-free so each caller chooses its own no-match behavior. Implements the design discussed on mne-tools#13820.
|
On the exact-vs-regex question at the end of the thread, I think there's a path that gives the regex everyone wants without the The key observation: that helper does two separable things ‚ (1) a regex match over descriptions, and (2) resolving each match to an integer trigger via def _match_descriptions(descriptions, regexp):
"""Return indices of descriptions matching ``regexp`` (all if None)."""
regexp_comp = re.compile(".*" if regexp is None else regexp)
return [
ii for ii, desc in enumerate(descriptions)
if regexp_comp.match(desc) is not None
]
On the API, I'd suggest
I've implemented exactly this with a parametrized test, and confirmed the refactor is behavior-identical for Testing(mnedev) bkowshik@Coimbatore mne-python % python -c "
import numpy as np, mne
raw = mne.io.RawArray(np.zeros((1,4000)), mne.create_info(1,1000.,'eeg'))
raw.set_annotations(mne.Annotations([0,1.5,3.0],[1,.5,.5],['BAD_blink','stimulus','BAD_movement']))
print('all :', len(raw.crop_by_annotations())) # 3
print('^BAD_ :', len(raw.crop_by_annotations(regexp='^BAD_'))) # 2
print('^bad_ :', len(raw.crop_by_annotations(regexp='^bad_'))) # 0 + RuntimeWarning
print('(?i)bad :', len(raw.crop_by_annotations(regexp='(?i)^bad_'))) # 2
"
Creating RawArray with float64 data, n_channels=1, n_times=4000
Range : 0 ... 3999 = 0.000 ... 3.999 secs
Ready.
all : 3
^BAD_ : 2
<string>:7: RuntimeWarning: No annotation descriptions matched regexp '^bad_'; returning an empty list.
^bad_ : 0
(?i)bad : 2 |

Reference issue (if any)
fixes #13743
What does this implement/fix?
Raw.crop_by_annotations()currently crops the raw data for every annotation with no way to filter by description. This PR adds an optionaldescriptionparameter that lets you crop only annotations with matching descriptionsexample:
when
description=None(the default), the method behaves exactly as before, so there is no API breakage...filtering uses
np.isinon the annotation descriptions, which follows the same pattern as_select_annotations_based_on_descriptionalready used internally inmne/annotations.py.Additional information
test_crop_by_annotations_descriptioninmne/io/tests/test_raw.pymeas_dateandfirst_sampto match the existingtest_crop_by_annotationsstyledescription=Noneis the default