Optimization of data pre-processing methods for time-series classification of electroencephalography data

ABSTRACT The performance of time-series classification of electroencephalographic data varies strongly across experimental paradigms and study participants. Reasons are task-dependent differences in neuronal processing and seemingly random variations between subjects, amongst others. The effect of data pre-processing techniques to ameliorate these challenges is relatively little studied. Here, the influence of spatial filter optimization methods and non-linear data transformation on time-series classification performance is analyzed by the example of high-frequency somatosensory evoked responses. This is a model paradigm for the analysis of high-frequency electroencephalography data at a very low signal-to-noise ratio, which emphasizes the differences of the explored methods. For the utilized data, it was found that the individual signal-to-noise ratio explained up to 74% of the performance differences between subjects. While data pre-processing was shown to increase average time-series classification performance, it could not fully compensate the signal-to-noise ratio differences between the subjects. This study proposes an algorithm to prototype and benchmark pre-processing pipelines for a paradigm and data set at hand. Extreme learning machines, Random Forest, and Logistic Regression can be used quickly to compare a set of potentially suitable pipelines. For subsequent classification, however, machine learning models were shown to provide better accuracy.


Introduction
Electroencephalography (EEG) measures electric brain activity with high temporal resolution.One main clinical application of EEG is its essential role in the diagnosis of epilepsy as a direct measure of the hypersynchronized electrocortical activity eventually causing seizures.Clinical decisions based on EEG recordings need to be accurate, as false classifications or mistakenly detected patterns during the analysis could lead to serious adverse consequences for patients, such as false diagnosis of epilepsy or false localization of epileptogenic brain areas prior to respective epilepsy surgery.Clinical EEG, which may last up to a week in case of long-term EEG measurements, is interpreted based on the subjective judgement of highly trained and sufficiently qualified professionals.Years of EEG experience help speeding up the analysis whereas factors such as fatigue of the professional, or the occurrence of complex EEG patterns slow the process down.To accelerate and optimize the general process of analysing EEG time series, recent advancements in computational neurosciences and computer sciences have brought several automated algorithms for objective and quantitative EEG analysis (Golmohammadi et al. 2019), as well as the detection of epileptiform patterns (Lodder and van Putten 2014;Fedele et al. 2017).
The task of automatic EEG analysis can be generalized as the classification of time series.In the field of computer sciences, time series are considered as arbitrary features, possibly after several pre-processing steps and embedding into higher-dimensional feature spaces (Knutsson et al. 1998;Huang et al. 2004Huang et al. , 2006;;Hatami et al. 2017).Performed on EEG data, time-series classification (TSC) deals not only with the challenging aspects subject to all TSC tasks -like trends in data and correlations between variables over time -but with additional hurdles (Craik et al. 2019;Rashid et al. 2020).These additional hurdles encompass differences in experimental and recording setups, intra-and inter-subject variability of the signal of interest, and several noise sources of technical and biological origin.Technical sources of noise encompass impedance-dependent thermal electrode noise, 50/60 Hz alternating current (AC) noise, electrode artefacts, artefacts due to cable movements, and far-field potentials of electric devices like elevators or, in a clinical context, respirators and medical pumps (Blankertz et al. 2001;Buzsáki 2006;Bagnall et al. 2017Bagnall et al. , 2018)), amongst others.The main sources of "biological noise" are surface electromyogram (EMG) artefacts from tense muscles, movements, ocular and blink artefacts, as well as electrocardiogram (ECG) artefacts (Waterstraat et al. 2015).
Noise components constitute a substantial part of the recorded EEG data and limit the performance of applied TSC algorithms.The reduction of noise can be realized in two distinct steps.Most effectively, noise contribution is already minimized in the first step by the experimental design and recording setup.This includes low technical noise levels of the recording equipment, adequate filter and digitization settings, as well as muscle relaxation by the subjects EEG is recorded from (Waterstraat et al. 2015).As the second step, the recorded data can be processed post hoc in an attempt to "clean" it from noise components.Popular techniques for this purpose include normalization, outlier rejection, band-pass filtering, re-referencing to different standard EEG montages (Michel and Brunet 2019), and independent component analysis (ICA) (Winkler et al. 2011).
Research in the field of computational neurosciences has led to a variety of additional algorithms to extract and localize brain oscillations with certain desirable properties defined a priori.As a prominent example, independent component analysis (ICA) separates sources by maximizing their mutual independence (Hyvärinen and Oja 2000).Other algorithms, concerned with the analysis of event-related or evoked potentials, include common spatial pattern (CSP) analysis which maximizes EEG power differences between two experimental conditions (Blankertz et al. 2008) and canonical correlation average regression (CCAr) which maximizes the correlation between singletrial and trial-averaged activity (Fedele et al. 2013;Waterstraat et al. 2015).By construction, these algorithms lead to physiologically interpretable results that can provide insights into the analysed brain processes (Koles 1991;Nikulin et al. 2011;Yu et al. 2012;Waterstraat et al. 2015).Altogether, these algorithms can be used for pre-processing of (EEG) time series before TSC model building.
In the computer-sciences fields of machine learning (ML) and its subfield deep learning (DL), models are built from pre-defined algorithms by fitting parameters to the input data.ML approaches have long dominated the area of TSC (Bagnall et al. 2017(Bagnall et al. , 2018)).However, DL models recently became increasingly more popular (Hatami et al. 2017;Fawaz et al. 2018;Susto et al. 2018;Craik et al. 2019;Dubreuil-Vall et al. 2020), and could even provide deeper insights into brain physiology and pathology.The questions arise: Why do DL models become increasingly popular, and is this justified?Could simpler models such as ML models achieve comparable results?For these reasons, ML and DL models were chosen in this work.
Several studies aimed at describing and quantifying the impact of data quality and data pre-processing techniques on the performance of automatic EEG analysis (Kotsiantis et al. 2007;Fawaz et al. 2018;Craik et al. 2019).However, few publications consider the effect of pre-processing when using DL and ML (Wang et al. 2018;Wei et al. 2018).Additionally, these studies comprise considerable differences in the characteristics of the analysed data, input formulations, and pre-processing techniques (Xu and Plataniotis 2016;Hatami et al. 2017;Qiao et al. 2017;Kuanar et al. 2018;Vrbancic and Podgorelec 2018;Wei et al. 2018;Dubreuil-Vall et al. 2020;Vahid et al. 2020).Further, popular methods from the field of computational neurosciences are not covered.Consequently, no general advice for the choice of a pre-processing pipeline and TSC model can be derived.
Here, non-invasive low-noise EEG recordings of high-frequency somatosensory evoked responses (hfSERs) from 10 healthy subjects, originally presented in (Waterstraat et al. 2016), are re-analysed.This data shows a particularly low signal-to-noise ratio (SNR) due to the inherently low signal strength of the hfSERs, despite special research-grade recording devices.Therefore, algorithms obtained here are expected to perform well also for EEG data with higher SNR (e.g., at lower frequencies).hfSERs, elicited by peripheral sensory or mixed nerve stimulation, are macroscopic ,600 Hzwavelet bursts and were shown to represent cortical population spiking activity otherwise only measurable through invasive microelectrode recordings.Despite a very low SNR, it has recently been shown that non-invasive ultralow-noise magnetoencephalography (MEG) recordings enable singletrial analysis of hfSERs (Waterstraat et al. 2021).This uncovered a natural variability in the processing of somatosensory stimuli in the brain, and refined TSC of hfSERs would provide a method for the non-invasive analysis of real-time interactions between population spiking activity and lowfrequency brain oscillations.
TSC of hfSERs can be regarded as a model paradigm to study similar signals observable in EEG recordings.As an example, the role of highfrequency EEG activity is actively investigated in epilepsy and Parkinson's disease, but is often restricted to invasive recordings due to unfavourable SNR of surface recordings.Hence, optimal pre-processing and selection of TSC models for high-frequency (f � 500 Hz) EEG signals (Waterstraat et al. 2016) could deepen the insights into pathological neuronal activity and the interrelations between different brain oscillations at a high temporal resolution.Further, the increased sensitivity of optimized TSC models could improve on-line analysis of neuronal activity in aforementioned diseases and support the control of closed-loop stimulation devices such as for deep brain stimulation.

Data characteristics
For this analysis, data from (Waterstraat et al. 2016) were re-analysed.The study was approved by the local ethics committee (EA4/119/13) and contains low-noise EEG recordings from three female and seven male subjects, with a mean age of 35 years, ranging from 22 to 56 years.The recordings were performed with a custom-built low-noise EEG amplifier specifically designed for increased sensitivity in the high-frequency range (Waterstraat et al. 2015).Measurements consist of 8-channel recordings at a sampling frequency of 10 kHz.The impedance at the Ag/AgCI-electrodes, placed at the positions Fz, F3, FC5, Cz, C5, CP5, T7 and CP1 according to the international 10/20system, was carefully prepared to, and kept, below 2 kΩ.Repeated controls of the electrodes and impedance assured mean levels of impedance around 1 kΩ, which minimized the impact of thermal Johnson-Nyquist noise on the measurements.The right-median nerve of study participants was electrically stimulated at the wrist with a monophasic-rectangular-shaped stimulus of 0:02 ms stimulus duration at a frequency of 4 Hz and an intensity of 1:5� motor threshold, such that a twitch of the thumb was visible for every stimulus in every subject.To minimize the effect of exhaustion and fatigue on study participants, measurements were conducted in three blocks of approximately 10 min each, during which the participants were instructed to avoid any movements and keep their eyes open whilst minimizing blinking frequency.The blocks were interjected with breaks of participant-chosen duration.The study protocol included a thirty-minute EEG at rest, which was not re-analyzed in this evaluation.Data characteristics and the influence of CSP, CCAr, as well as Hilbert transformation on part of the analysed data are shown by time-domain depictions of SERs for the subject with the highest SNR (S3) in Figure 1.

Data pre-processing methods
Stimulus artefacts were interpolated using monotone cubic Hermite spline interpolation in the window of [−8 ms to 2 ms] relative to the electric stimulus (Waterstraat et al. 2015(Waterstraat et al. , 2016)).The time series was spectrally filtered between 500 Hz and 900 Hz, using a seventh-order Butterworth filter with transition bands from 450 Hz to 500 Hz and 900 Hz to 1000 Hz, maximal attenuation in the passband and minimal attenuation in the stopband, respectively, of 3 dB, realized as a zero-phase filter by filtering in both directions with a total filter order of 14. SNR was quantified as "signal-plus-noise"-to-noise ratio (SNNR) by dividing the average singletrial EEG power during hfSERs (10-35 ms) by a period devoid of evoked high-frequency activity (25 ms-window, e.g., 40-65 ms).The greater the SNNR, the easier the distinction between signal and noise.An overview of SNNR values in the analysed data is depicted in Table 1.
For training and evaluation of TSC algorithms, the recorded data was split into periods containing either hfSERs or only noise, essentially as training input to hfSER-detectors.The hfSER-class was defined as the [10 ms to 35 ms]-window of every trial (relative to stimulus-onset), whereas the noise class was sampled as continuous 25 ms-windows randomly selected from periods devoid of evoked activity (within the windows [−55 ms to −10 ms] and [45 ms to 95 ms]).The number of noise realizations was chosen to result in a class ratio of 1:1 between noise and hfSERs for each subject.The outlierrejection procedure applied in the original publication (Waterstraat et al. 2016) proved negligible for the performance of the algorithms evaluated in this work and was consequently not utilized here.
This study compared the effectiveness of Common Spatial Pattern (CSP) (Koles 1991) and Canonical Correlation Average Regression (CCAr) (Knutsson et al. 1998) over the standard bipolar electrode montage Fz -CP5 (in this work referred to as "baseline data").CSP, a signal decomposition that maximizes the power differences between two conditions, was applied to hfSER-and noise-periods as positive and negative conditions, respectively.CCAr -a decomposition that maximizes the correlation between single trials and their grand average -was trained to find a source with most reproducible single-trial hfSERs.Subsequently, though trained only from the hfSERperiod [10 ms to 35 ms] in every subject, the obtained spatial filters were applied to the entire data of that subject (i.e., to both hfSER class or noise).hfSER and noise activity from the three CSP or CCAr patterns with the highest Eigenvalue served as input features.As an optional final step, this input was enhanced by the phase, amplitude and imaginary part of the analytic EEG signal, obtained from Hilbert transformation of the CSP-or CCAr-filtered data (Hilbert 1912), depicted in Figures 2 and 3.An overview of the pre-processing pipeline is given in Figure 4.

Models and training
To answer the questions "Why do DL models become increasingly popular, and is this justified?Could simpler models such as ML models achieve comparable results?", the traditional ML models support vector machine (SVM), random forest (RF), and logistic regression (LR) were compared to the DL-architectures convolutional neural networks (CNNs) AlexNet (Krizhevsky et al. 2012), multichannel CNN (MC-CNN) (Zheng et al. 2016), and the artificial neural network "regularized extreme learning machine" (ELM) (Huang et al. 2004;Deng et al. 2009).This section complies with a recommended checklist of items to include in the description of DL-EEG studies (Roy et al. 2019).It was unfeasible to define universal hyperparameters for all models a priori.Instead, a grid-search for hyperparameter optimization was employed on a random subset of the trainingdata.Early-stopping after 25 epochs was utilized in the training of the DLnetworks to avoid overfitting.Ultimately, using the implementations of the toolbox, the following hyperparameters were used for the experiments with the CNNs: learning-rate lr ¼ 0:001, batch-size b ¼ 32, optimizer SGD (Robbins and Monro 1951;Bottou et al. 2018), with the application of clipvalue c ¼ 5:0 and momentum m ¼ 0:9 (Dubreuil-Vall et al. 2020).Optimal hyperparameters for ELM were found individually for every subject, justified by the nature of the algorithm.Similarly, hyperparameters were tuned subject-dependently for each of the ML-models.Detailed information on model initialization, the parameter space of the grid-search, the source code for the networks, data preprocessing steps, and figures can be found in the code repository (https:// github.com/RealCAnders/MAPP)and in the supplementary material.All networks' input-shape was inputlen � 1, with inputlen being multiples of 250-the amount of data points in the 25 ms-long single-trial data segments.Networks were trained for intra-subject classification on shuffled data with a train-validation-test-data split of 45%−22%−33%, where predictions were made on the held-out test data set from the same study participant.The performance of TSC for any combination of subject, pre-processing pipeline, and DL TSC model was evaluated in terms of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.ML and DL models were compared and evaluated by means of classification accuracy.

Results
Across the utilized data, study participants show a mean SNNR of 1:354 (sd ¼ 0:251; range 1:07 -1:97) for data in the pass-band 500 Hz -900 Hz and after subject-dependent application of CCAr, respectively.Figure 3 contrasts the TSC models' input features for two exemplary subjects with low and high SNNR, respectively, utilizing the pre-processing pipeline of CSP with successive Hilbert transformation.To assess the influence of data pre-processing on the performance of optimally prepared TSC models, the efficacy of the binary classifiers was compared between "baseline data" (pass-band 500 Hz − 900 Hz, bipolar montage Fz À CP5) and spatially filtered data with or without Hilbert transformation.The derived results for ML models are depicted in Table 2.The best data pre-processing method for ML was CCAr, which improved the classification accuracy by 6.87% on average, while CSP led to an average performance loss of 2.97%, which was however mitigated at 0.23% of performance loss when using Hilbert transformation subsequently.The derived results for DL models are depicted in Figure 5. DL TSC performance varied between chance-level predictions (0.56 mean AUC across models) for the subjects with the lowest SNNR values (S1 and S4), and 0.87 mean AUC for the subjects with the highest SNNR (S2 and S3), with an average AUC across models and subjects of 0.67.The analysis proved the impact of data pre-processing to be substantial.The highest performance increase was achieved for CSP with subsequent Hilbert transformation, which showed a performance increment of 0.13 AUC compared to the "baseline data".CCAr in combination with subsequent Hilbert transformation showed an average performance increment of 0.11 AUC over the "baseline data".CSP and CCAr without successive Hilbert transformation, however, showed an average performance increment of only 0.03 and 0.04 AUC.
This shows that the Hilbert transform substantially supports solving the TSC task as shown by an average increase of performance of 0.08 AUC over utilizing CSP or CCAr solely.However, while data pre-processing was shown to increase average TSC performance, it could not compensate for SNNR differences between the subjects.For participants S1 and S4, with the lowest SNNR values of 1.15 and 1.07, the mean AUC across all pre-processing pipelines and networks was 0.28 worse than for the participants S2 and S3 with the highest SNNR values of 1.54 and 1.97, respectively.A statistical analysis is presented in Table 3.
Table 2. Classification accuracy in percentage for the ML models SVM, RF, and LR compared with the DL models ELM, Alex-Net, and MC-CNN per subject and on average (AVG) for (I) baseline, (II) CSP, (III) CCAr, (IV) CSP + Hilbert, and (V) CCAr + Hilbert data.Results are colour-coded to depict better (red), or worse (blue) results.The impact of SNNR (comp.Table 1) is clearly visible.Overall, CCAr worked best, and LR outperformed the other models in nearly every task.
The SNNR was statistically significantly correlated with the TSC performance (p < 0:02 for all pre-processing pipelines and TSC models) with R 2values of 0.74 ("baseline-data"), 0.57 (CSP), 0.67 (CCAr), 0.63 (CSP + Hilbert transformation), and 0.74 (CCAr + Hilbert transformation).The performance of the three DL classification models for each data pre-processing pipeline was compared through a Wilcoxon signed-rank test.For all preprocessing pipelines, MC-CNN performs significantly better (p < 0:05) than ELM.MC-CNN also performs better than AlexNet for all pre-processing pipelines with the exception of CSP + successive Hilbert transformation.AlexNet performs significantly better than ELM in the case of CSP, CSP + Hilbert transformation, and CCAr + Hilbert transformation; however, it does not perform better for "baseline data" (p ¼ 0:333) and CCAr without Hilbert transformation (p ¼ 0:114).The performance of the different classifiers for data pre-processed with the CSP + successive Hilbert transformation pipeline is shown in Figure 6.Statistical analyses of TSC performance for the different pre-processing pipelines and models are presented in Table 3.   3), best performance was achieved in participant S3; worst performance in participant S4.Across all participants, except for S2 and S3, the obtained AUC substantially depended on the choice of TSC model.On average, AlexNet performed best (comp.Table 3).Wilcoxon signed-rank tests for equality of median AUCs, however, show that the difference between AlexNet and MC-CNN is not statistically significant: a) p ¼ 0:093, b) p ¼ 0:005, c) 0:013.
Finally, the differences in training times were compared.SVM, AlexNet, and MC-CNN took multiple hours for training and hyperparameter optimization, whereas ELM, RF, and LR took only a couple of seconds.

Discussion
In this study, the impact of data pre-processing on TSC performance was analysed with the example of low-noise hfSER data.Three ML and three DL TSC models were evaluated including a comparison of different preprocessing pipelines which utilized spatial filter optimization algorithms with or without successive non-linear transformations of the input data.Due to the immanent low SNR of the hfSER paradigm, the analysis amplified performance differences between the TSC models and favoured models with increased robustness to noise.Accordingly, the results will, at least in parts, be meaningful for other types of data as well.Seven major results were obtained: (1) For all ML classifiers, spatial filter optimization by CCAr and often also for subsequent Hilbert transform increased the TSC performance.
(2) For all DL classifiers, spatial filter optimization algorithms increased the performance of the TSC models.Optimal spatial filtering "cleans" (Blankertz et al. 2001;Buzsáki 2006;Waterstraat et al. 2015;Bagnall et al. 2018Bagnall et al. , 2017) ) the data by separating signal and noise sources due to their different electric field patterns on the surface of the head.By choosing adequate projections, the impact of technical and biological noise sources on the filtered data is reduced.Focusing on the signal of interest provides insights into analysed brain processes (Koles 1991;Nikulin et al. 2011;Yu et al. 2012;Waterstraat et al. 2015).
(3) SNR of the data is a main determinant of TSC performance.In this study, the SNNR of the subjects explained up to 74% of the variance between AUC values for the "baseline-data".Adequate and accurate data pre-processing methods can increase TSC performance substantially, but it will always significantly depend on the specific paradigm, recording technique, and the participating subjects.(4) Embedding the input data into a higher-dimensional feature space (Knutsson et al. 1998;Huang et al. 2004Huang et al. , 2006;;Hatami et al. 2017) using non-linear Hilbert transform (Hilbert 1912) greatly increased TSC performance, especially in case of DL models.In contrast, while linear transformation of the data with 2D-Recurrence Plots (Eckmann et al. 1987) has been shown to increase TSC performance in some studies (Eckmann et al. 1987;Skinner 1994;Hatami et al. 2017), for this data it couldn't support the task of TSC (analysis not shown).Accordingly, no general recommendation for a specific data transformation can be given a priori, and will depend on the specific paradigm, recording technique, and the participating subjects.This emphasizes the need of evaluating different pre-processing approaches for a specific data set at hand before final TSC model building.(5) Between the two assessed spatial filtering algorithms (CCAr and CSP), CCAr showed to be more effective in increasing TSC performance.This finding has to be interpreted in the context of the studied hfSER paradigm.As an evoked response, hfSERs are expected to show similar waveforms across successive stimuli.As CCAr projects single trials to be more similar to the grand average, it is tailored for the analysis of evoked potentials.On the other hand, CSP maximizes power differences and should be considered if such differences are expected to discriminate between the conditions of the paradigm.Contributing to the choice between these methods, for this data, CSP proved to be more stable in case of low SNNR values.In general, however, both algorithms should be included in a thorough evaluation.(6) Apart from TSC performance, training times were shown to vary strongly between TSC models, constituting another factor to consider, especially for online use cases.While AlexNet needed about three times the training time of MC-CNN and ELM, none of the TSC pipelines in this comparison was sufficiently swift for online applications.AlexNet and MC-CNN benefited from subject-independent, a priori-defined, hyper-parameters.However, only the ELM training included subject-specific five-fold cross-validation of 51 possible hyper-parameter values.If ELM was to be trained with a priori knowledge about the optimal hyper-parameters for a specific subject (as done for AlexNet and MC-CNN), training time would be reduced to significantly less than one minute, rendering it suitable for on-line learning in many applications, just like RF and LR.Accordingly, ELM, RF, and LR can be chosen to quickly build and compare prototypes of data pre-processing pipelines.(7) Comparing ML models and DL models, this paper did not find a reasonable justification for the use of computationally more intensive DL models over similarly (or even better) performing ML models.Any of the ML models outperformed all of the DL models.

Conclusion
Time-series classification of electrophysiological data remains challenging due to different types and power of noise sources as well as differences in experimental paradigms and signal properties.Altogether, TSC performance heavily depends on the SNR.In this study, it has been shown that data preprocessing significantly influences the capability of TSC models to detect the signals of interest.
Here, optimized spatial filtering of the data and non-linear Hilbert transformation proved essential to maximize TSC performance.Spatial filtering maximizes the SNR by isolating the signal of interest from noise sources due to their different electromagnetic field patterns on the head.In particular, CSP and CCAr were tested, but other algorithms (e.g., independent component analysis or principal component analysis) might be viable options depending on the type of data.Additionally, extending the data by non-linear transformations is likely to increase TSC performance.DL methods inherently apply nonlinear feature transformations as well, but the analyses show that these might not be optimally chosen automatically.Here, Hilbert transformation resulted in physically meaningful and interpretable variables (real and imaginary parts of the signal and amplitude and phase content of the signal) -a property that is typically not present in DL networks' transformation layers.Due to this complexity, no one-size-fits-all solution was derived, but a recommendation of an algorithm to prototype and evaluate combinations of pre-processing pipelines and TSC models can be given.
Depending on the experimental paradigm and associated differences of data properties between conditions, suitable algorithms to increase the SNR, and combinations thereof, should be identified.These, amongst others, include spectral filtering and spatial filter optimization algorithms.

Figure 1 .
Figure 1.Time-domain depiction of somatosensory-evoked responses (SERs); exemplary data of S3.Colors correspond to the definitions of hfSER window [10 ms to 35 ms] (Orange) and noiseonly window [45 ms to 95 ms; depicted only until 60 ms] (Purple).Data outside these windows shaded in gray.(A): the standard wide-band [0.5 Hz to 5000 Hz] average SER (n=5940) in the bipolar montage (Fz -CP5) (top row) shows small high-frequency deflections which can be isolated by band-pass filtering [500 Hz to 900 Hz] (2nd row).In the band-pass filtered bipolar montage, single-trial analysis of hfSERs is unfeasible since the single-trial standard deviation (shaded area) almost encloses the peak amplitude range of the averaged hfSER.Spatial filtering techniques, such as CSP (3rd row) and CCAr (4th row), increase the hfSERs amplitude in comparison to single-trial amplitude fluctuations of concurrent noise.(B): 45 consecutive single trials after band-pass filtering [500 Hz to 900 Hz] and spatial filtering (CCAr-filter with highest canonical correlation coefficient); real, imaginary, amplitude, and phase obtained by Hilbert transformation reveal complementary data features as input to TSC models.

Figure 2 .
Figure 2. Demonstration of phase-coherence between single-trial hfSERs and coherence to a 600 Hz sinusoid in the hfSER window [10 ms to 35 ms] (Orange) in contrast to a noise-only window [45 ms to 95 ms] (Purple) after band-pass filtering [500 Hz to 900 Hz]) and spatial filtering (CCAr); exemplary data from subject S3. (Top): the averaged hfSER is depicted to mark placement of single-trial hfSER-and noise-only-windows.(Middle): the polar plots show the instantaneous phase and amplitude of single-trial activity in the respective windows, obtained by Hilbert transformation and represented as angle (in rad) and distance from the origin of the axes (in a.u.).Each point in the plot represents an individual single-trial time-point.(Bottom):color in the polar plots refers to the phase of a 600 Hz-sine wave at every point's latency relative to the onset of the respective window.Hence, the color-coding in the hfSER polar plot shows that the hfSER is phase-locked to a 600 Hz-oscillation, observable as a rainbow-like color distribution, whereas the signal from the noise-only window appears to be random.Single-trial amplitudes are increased during the hfSER burst, as shown by larger distances from the origin of the polar plots.

Figure 3 .
Figure 3. Visualization of TSC input features from S03 (Top) and S04 (Bottom), the two study participants with highest and lowest SNNR, for the CSP + Hilbert transform pre-processing pipeline.Labels below the x-axis indicate signal type: real and imaginary part, and amplitudeand phase-content of the CSP-filtered and Hilbert-transformed data.These four Hilbert-based feature blocks [real, imaginary, amplitude, and phase]-calculated independently for every CSP filter-encompass the very same segment of data (10 ms -35 ms relative to electrical stimulation) and were concatenated along the time-axis in a first step.In a second step, data obtained from filtering with the three best CSP filters (shown in Brown/Cyan/Olive) were concatenated.The obtained traces finally served as input to the subject-specific classifiers.This procedure resulted in a total length of 3000 data points as input features, as opposed to 250 points in the baseline data set (250 data points � 4 Hilbert-based features � 3 CSP filters).The SNNR difference between the two subjects is reflected by the differential observability of oscillatory patterns in real-, imaginary-, amplitude-, and phase-content.

Figure 4 .
Figure 4. Overview of this work's pre-processing pipeline.

Figure 5 .
Figure 5.Comparison of AUC-values for ELM, AlexNet, and MC-CNN; the shown ROC curves were obtained by averaging subject-specific ROC curves across all 10 subjects.The utilized preprocessing methods are shown in the per-subplot legends together with respective AUC values and standard errors.Spatial filtering with CSP and CCAr alone leads to only marginal AUC improvements over "baseline data" (500 Hz -900 Hz, Fz -CP5).However, the binary classifiers benefit from additional Hilbert transformation as indicated by the increases in AUC values.A statistical analysis is presented in Table3.

Figure 6 .
Figure 6.Subject-specific AUC values for CSP + Hilbert transformed input features.Due to SNNR differences (comp.Figure3), best performance was achieved in participant S3; worst performance in participant S4.Across all participants, except for S2 and S3, the obtained AUC substantially depended on the choice of TSC model.On average, AlexNet performed best (comp.Table3).Wilcoxon signed-rank tests for equality of median AUCs, however, show that the difference between AlexNet and MC-CNN is not statistically significant: a) p ¼ 0:093, b) p ¼ 0:005, c) 0:013.
Figure 6.Subject-specific AUC values for CSP + Hilbert transformed input features.Due to SNNR differences (comp.Figure3), best performance was achieved in participant S3; worst performance in participant S4.Across all participants, except for S2 and S3, the obtained AUC substantially depended on the choice of TSC model.On average, AlexNet performed best (comp.Table3).Wilcoxon signed-rank tests for equality of median AUCs, however, show that the difference between AlexNet and MC-CNN is not statistically significant: a) p ¼ 0:093, b) p ¼ 0:005, c) 0:013.

Table 1 .
"Signal-plus-noise"-to-noise ratios (SNNR) for the spectrally filtered data per participant.The highest SNNR values are depicted in bold face.

Table 3 .
Benchmarking of pre-processing pipelines and TSC models.Values in grey-shaded cells express AUC values for the combination of pre-processing pipeline and TSC model.Pre-processing pipelines were tested against each other using multiple Wilcoxon signed-rank tests.pvalue<0:05are printed in bold font; respective cells are colour-coded to depict significantly better (red), or significantly worse (blue), results.For example, the combination of alexnet and CCAr + Hilbert resulted in a mean AUC value of 0.75, which was significantly worse than CSP + Hilbert (AUC ¼ 0:84; p ¼ 0:017), the best result obtained.Minor differences to the AUC values in Figure5exist, as in Fig.5AUC values were estimated from the depicted average ROC curve while AUC values stated in this table expresses the average of the individual AUC values.