--- jupyter: jupytext: text_representation: extension: .qmd format_name: quarto format_version: '1.0' jupytext_version: 1.14.7 kernelspec: display_name: Python 3 language: python name: python3 --- # Example: ERP CORE data ## Loading Python modules ```{python} import json import seaborn as sns from pipeline import group_pipeline from pipeline.datasets import get_erpcore ``` ## Downloading example data The pipeline comes with a function to download example data from the ERP CORE dataset. This dataset contains data from 40 participants who completed six different EEG experiments which were specifically designed to elicit seven common ERP components. For example, in the N400 experiment, participants viewed pairs of prime and target words that were either semantically related (EEG triggers `211` and `212`) or unrelated (EEG triggers `221` and `222`). The raw data are stored in the [Open Science Framework](https://osf.io/29xpq) and more details about the study are in [Kappenman et al. (2021)](https://doi.org/10.1016/j.neuroimage.2020.117465). ```{python} #| tags: [hide-output] n400_files = get_erpcore('N400', participants=4) ``` ```{python} def print_dict(d): print(json.dumps(d, indent=4)) print_dict(n400_files) ``` To save time, we only download and process data from the first four participants. Feel free to re-run the example with more participants by increasing or removing the `n_participants` argument. The paths of the downloaded raw EEG files (`.set`) and events file (`.tsv`) can now be fed into pipeline. ## Running the pipeline We run a simple pipeline for single-trial ERP analysis with the following steps: - Downsampling from 1024 to 256 Hz - Applying standard Biosemi montage for channel locations - Re-referencing to common average (not shown) - Automatic detection and interpolation of bad channels - Ocular correction with ICA - Bandpass filtering between 0.1 and 30 Hz - Segmentation to epochs around stimulus triggers - Baseline correction (not shown) - Rejecting bad epochs based on peak-to-peak amplitudes > 200 µV (not shown) - Computing single trial N400 amplitudes by averaging across the time window and channels of interest - Creating by-participant averages for the related and unrelated conditions ```{python} #| tags: [hide-output] trials, evokeds, config = group_pipeline( # Input/output paths raw_files=n400_files['raw_files'], log_files=n400_files['log_files'], output_dir='output', # Preprocessing options downsample_sfreq=256.0, montage='biosemi64', bad_channels='auto', ica_method='fastica', highpass_freq=0.1, lowpass_freq=30.0, # Epoching options triggers=[211, 212, 221, 222], skip_log_conditions={'value': [111, 112, 121, 122, 201, 202]}, components={'name': ['N400'], 'tmin': [0.3], 'tmax': [0.5], 'roi': [['Cz', 'CPz']]}, # Averaging options average_by={'related': 'value in [211, 212]', 'unrelated': 'value in [221, 222]'}) ``` See the [Pipeline inputs](../inputs_py.rst) page for a list of all available processing options. ## Checking the results This pipeline returns three objects: A dataframe of single trial ERP amplitudes, a dataframe of by-participant condition averages, and a dictionary of pipeline metadata. ### Single trial amplitudes These are basically just the log files, concatenated for all participants, with two added columns for the two ERP components of interest. Each value in these columns reflects the single trial ERP amplitude, averaged across time points and channels of interest. Here are the first couple of lines of the dataframe: ```{python} trials.head() ``` Since we only have four participant and relatively few trials per participant, we can show all the single trial amplitudes in one plot (color-coded by condition): ```{python, tags=c("nbsphinx-thumbnail")} trials['condition'] = trials['value'].map({211: 'related', 212: 'related', 221: 'unrelated', 222: 'unrelated'}) trials['participant'] = trials['participant_id'].str.extract(r'(sub-\d+)') _ = sns.swarmplot(data=trials, x='participant', y='N400', hue='condition') ``` We could also use this dataframe for statistical analysis on the single trial level, e.g., using linear mixed-effects models with the `lme4` package in R (see [UCAP example](ucap.qmd)) or the `statsmodels` package in Python. ### By-participant averages This is one big data frame which, unlike `trials`, is averaged across trials (i.e., losing any single trial information) but *not* averaged across time points or channels (i.e., retaining the millisecond-wise ERP waveform at all electrodes). ```{python} evokeds.head() ``` We can use it to display the grand-averaged ERP waveforms for different conditions as a timecourse plot at a single channel or ROI (here for the N400 ROI): ```{python} _ = sns.lineplot(data=evokeds, x='time', y='N400', hue='label', errorbar=None) ``` Note that we're explicitly disabling error bars here because they would be invalid due to the fact that our condition effect (related vs. unrelated) is a within-participant factor. See the [UCAP example](ucap.qmd) for how to compute and plot valid within-participant error bars around the grand-averged evoked waveform. ### Pipeline metadata This is a dictionary with various metadata about the pipeline run. It contains: - The user-specified input arguments - The default values for those arguments that were not specified - Some descriptive statistics that were computed by the pipeline along the way (e.g., the number of indices of rejecected epochs based on peak-to-peak amplitude) - The software version of Python, the pipeline and its most important dependency packages ```{python} print_dict(config) ```