Feature extraction for EEG seizure prediction (CHB-MIT): GMW, Teager Operator, and handling outliers in normalization

NEUROSP · April 27, 2026, 7:43pm

I am building a machine learning pipeline for seizure prediction using the CHB-MIT Scalp EEG Database. My goal is to extract features that capture both time-frequency dynamics and spatial (channel-to-channel) relationships, which will eventually be fed into a Graph Neural Network (GNN).

Preprocessing:

The raw signals are sampled at 256 Hz. I apply a high-pass filter at 0.5 Hz to remove baseline wander, and notch filters at 57–63 Hz and 117–123 Hz to remove powerline noise and its harmonics.

I would like some feedback on whether my mathematical formulation for feature extraction is sound for this domain, and specifically how to best handle data normalization given the extreme outliers typical of EEG artifacts and seizures.

1. Feature Extraction Formulation

A. Generalized Morse Wavelets (GMW)

To capture instantaneous energy and frequency shifts, I compute Continuous Wavelet Transforms using analytic Generalized Morse Wavelets. The frequency-domain wavelet is defined as:

\Psi_{\beta, \gamma}(\omega) = U(\omega) a_{\beta, \gamma} \omega^\beta e^{-\omega^\gamma}

where U(\omega) is the Heaviside step function (ensuring analyticity), \gamma = 3.0 (symmetry), and \beta = 6.0 (time-bandwidth product). Let W_c(f_k, t) be the complex wavelet coefficient for channel c at center frequency f_k and time t.

B. Band-Specific Adjacency Matrices

To capture functional connectivity (the graph structure), I define standard EEG bands (e.g., Delta 1-4 Hz, Theta 4-8 Hz, up to Gamma 30-80 Hz). For a given band spanning a set of wavelet center frequencies K, I first compute the Root Mean Square (RMS) band envelope for each channel c:

E_c(t) = \sqrt{ \frac{1}{|K|} \sum_{k \in K} \left| W_c(f_k, t) \right|^2 }

I then compute the adjacency matrix \mathbf{A} for this specific band using the Pearson correlation coefficient between the envelopes of channels i and j over the time window T:

A_{ij} = \frac{\sum_{t \in T} (E_i(t) - \mu_i)(E_j(t) - \mu_j)}{\sigma_i \sigma_j}

where \mu and \sigma are the mean and standard deviation of the envelope within the window. The diagonal is set to zero.

C. Teager-Kaiser Energy Operator (TKEO)

To emphasize sudden spikes in energy and high-frequency variations (often precursors to seizures), I apply the discrete Teager-Kaiser operator directly to the time-domain signal x[n]:

\Psi(x[n]) = x^2[n] - x[n-1]x[n+1]

Because the amplitude range is large, I apply a signed-log transformation to stabilize the variance:

T[n] = \text{sign}(\Psi(x[n])) \log(1 + |\Psi(x[n])|)

2. The Normalization Problem

Currently, to prevent data leakage, I calculate the mean and standard deviation strictly on the training set, and use those to Z-score the validation and test sets.

However, EEG data—especially in the CHB-MIT dataset—contains massive amplitude spikes due to artifacts (muscle movement, eye blinks) and the seizures themselves. Computing a standard mean and standard deviation over the entire training set includes these extreme outliers, which artificially inflates the standard deviation and severely squashes the variance of the baseline (inter-ictal) signal.

My Questions:

Feature Suitability: Is the combination of GMW band envelopes and the time-domain Teager-Kaiser operator mathematically robust for capturing pre-ictal dynamics? Since the TKEO tracks instantaneous energy, is there a strong redundancy issue with the wavelet log-power that I should worry about?
Adjacency Extraction: Is computing functional connectivity via the Pearson correlation of the RMS wavelet envelope mathematically sound, or should I be computing Phase-Locking Value (PLV) directly from the complex phase angles for a GNN adjacency matrix?
Normalization Strategy: Because of the extreme outliers in EEG data, standard global Z-scoring seems flawed. Which of the following is considered best practice for long-term EEG monitoring data?

Option A: Switch to Robust Scaling (subtracting the median and dividing by the Interquartile Range / IQR).
Option B: Stick with Z-scoring, but compute the mean and std using only the middle 80% or 90% of the training data (a trimmed distribution) to completely exclude the artifacts/seizures from the parameter calculation.