Semiparametric Latent ANOVA Model for Event-Related Potentials

Event-related potentials (ERPs) extracted from electroencephalography (EEG) data in response to stimuli are widely used in psychological and neuroscience experiments. A major goal is to link ERP characteristic components to subject-level covariates. Existing methods typically follow two-step approaches, first identifying ERP components using peak detection methods and then relating them to the covariates. This approach, however, can lead to loss of efficiency due to inaccurate estimates in the initial step, especially considering the low signal-to-noise ratio of EEG data. To address this challenge, we propose a semiparametric latent ANOVA model (SLAM) that unifies inference on ERP components and their association to covariates. SLAM models ERP waveforms via a structured Gaussian process prior that encodes ERP latency in its derivative and links the subject-level latencies to covariates using a latent ANOVA. This unified Bayesian framework provides estimation at both population- and subject- levels, improving the efficiency of the inference by leveraging information across subjects. We automate posterior inference and hyperparameter tuning using a Monte Carlo expectation-maximization algorithm. We demonstrate the advantages of SLAM over competing methods via simulations. Our method allows us to examine how factors or covariates affect the magnitude and/or latency of ERP components, which in turn reflect cognitive, psychological or neural processes. We exemplify this via an application to data from an ERP experiment on speech recognition, where we assess the effect of age on two components of interest. Our results verify the scientific findings that older people take a longer reaction time to respond to external stimuli because of the delay in perception and brain processes.


Introduction
Event-related potentials (ERPs) refer to the electrical potentials measured by extracting voltages from electroencephalography (EEG) data in response to internal or external stimuli, typically during visual, auditory, or gustatory experiments.Raw EEG signals summarize all neural activity in the brain, making it difficult to distinguish specific neural processes related to perception, cognition, and emotion.In contrast, ERPs can identify multiple neurocognitive and affective processes contributing to behavior [Gasser and Molinari, 1996].With other advantages such as noninvasiveness, continuous high temporal resolution, and relatively low cost compared to other neuroimaging modalities such as functional magnetic resonance imaging, ERPs have become a widely used tool in psychological and neuroscience research [Luck, 2022, Vogel andMachizawa, 2004].
Clinical studies usually focus on specific characteristic components of the ERP waveforms, defined as voltage deflections produced when specific neural processes occur in specific brain regions [Luck, 2012].A main task of any ERP analysis is to estimate the amplitude, i.e. magnitude (in microvolts, µV) and the latency, i.e. position in time (in milliseconds, ms) of specific ERP components.In current ERP studies, most often researchers identify such components by visually inspecting the grand ERP waveform averaged across trials and subjects, often relying on previous scientific research findings.When the statistical analysis is done at the subject level, two-step approaches are typically used, first extracting the ERP components of interest and then performing an analysis of variance (ANOVA) to relate the extracted components to the subject-level covariates.In such analyses, in order to obtain a smooth ERP curve, the whole EEG experiment needs to be averaged across many trials and/or participants.In addition, filtering is necessary to remove the signal trend or drifts, together with noise that comes from biological artifacts or the electrical activity recording environment.Once a smoothed ERP curve has been obtained, a time window needs to be specified as the range of the component of interest.This is a critical choice in many of the analyses.For example, one of the common ways to quantify the amplitude of an ERP component of interest is as the mean amplitude integrated over the chosen window.This may lead to false discoveries when comparing the amplitudes of ERP components across control and experimental groups.
In statistics, there is limited contribution to methods for the identification and estimation of ERP components.Jeste et al. [2014] and Hasenstab et al. [2015] use LOESSbased methods that first smooth the noisy waveforms and then apply a peak detection algorithm on the smoothed curve to identify optima within a specified time window.Hasenstab et al. [2015], in particular, propose MAP-ERP, a meta-preprocessing step based on a moving average of the ERP functions over trials in a sliding window, to preserve the longitudinal information in ERP data, and then employ peak detection.With peak detection algorithms, latency is calculated as the location of the stationary point identified within a pre-set window and the amplitude is calculated as the value of the smoothed curve at the location of the stationary point or by integrating the area under the curve.Other approaches for detecting ERP components, in single subjects, use continuous wavelet transformation techniques, see for example Kallionpää et al. [2019].
Recently, various extensions of Gaussian Processes (GPs) have been applied to neuroscience problems.Kang et al. [2018] propose a soft-thresholded GP prior for feature selection in scalar-on-image regression utilized to represent sparse, continuous, and piecewise-smooth functions.Yu et al. [2023] employ Gaussian processes with derivative constraints, defining a peak or dip location as the stationary point such that the first derivative of the underlying smooth ERP waveform at the point is zero.While this method takes the data averaged across trials for analysis, as traditional studies do, it does not need separate smoothing steps of the ERP waveforms.Ma et al. [2023] develop the split-and-merge GP prior for selecting time windows in which the P300 ERP components in response to the target and non-target stimuli may be different.While the model considers multiple channel EEG signals and spatial correlation, it focuses on the amplitude difference and uncertainty about ERP functions.
In this paper we propose a novel semiparametric latent ANOVA model (SLAM) as a data-driven model-based approach that provides a unified framework for the estimation of ERP characteristics components and their association to subject-level covariates.Following Yu et al. [2023], we employ a Bayesian model with derivative-constrained Gaussian process priors, to produce smooth estimates of the ERP waveform together with amplitude and latency of ERP components, and incorporate into the model a flexible ANOVA setting, to examine how multiple factors or covariates, such as gender and age, affect the ERP components' latency and amplitude.Unlike the model of Yu et al. [2023], which considers one single ERP waveform at a time, either as an averaged single-subject ERP or a grand mean across subjects, our hierarchical framework allows estimates of latency and amplitude at the group level as well as at the individual subject level.To produce such inference, SLAM starts with an initial partition of the experimental time interval into sub-intervals, one for any potential ERP component location, and then employs derivative-constrained GPs to identify the latency of the components.The posterior distribution of latency parameters not only provides a more precise identification of the ERP component location, but allows the calculation of credible intervals that can be used as a refined search window where to compute amplitude estimates or, indirectly, as a reference in follow-up studies, to corroborate previous scientific findings or experts' knowledge.We perform posterior inference and hyperparameter tuning via a Monte Carlo expectation-maximization algorithm.Through simulations, we demonstrate how our proposed method produces smaller root mean square error for both latency and amplitude estimates, compared to the LOESS-based method.Unlike these methods, our approach provides a direct estimate of the latency parameter, without the need to apply curve smoothing.Moreover, SLAM provides automatic group-level estimates, while an additional ANOVA is needed when the LOESS-based methods are used.
In neuroscience and psychological sciences, ERP components are typically used to examine perception or cognitive differences between control and experimental groups or under different levels of an influential factor.In fact, the components are usually named by whether it is a positive wave (peak) or a negative wave (dip), and when it occurs.For example, the peaks occurring around 100 msec, N100, or around 200 msec, P200, after the start of the stimulus, have been implicated in speech recognition [Noe and Fischer-Baum, 2020], the N170 component in object recognition [Tanaka and Curran, 2001] and the N400 in language processing [Kutas and Hillyard, 1980].
Here we consider ERP data collected during an experiment designed to determine whether early stages of speech perception are independent of top-down influences.Our hierarchical framework allows us to estimate latency at both subject-and grouplevel.Our results show that the older group people tend to have longer latency of both the N100 and P200 ERP components.We also find that the older group people tend to have larger N100 amplitude.These results verify the scientific findings that older people take a longer reaction time to respond to external stimuli because of the delay in perception and brain processes [Noe and Fischer-Baum, 2020, Tremblay et al., 2003, Yi and Friedman, 2014].
The rest of the paper is organized as follows: Section 2 describes our proposed Bayesian hierarchical model and its associated algorithm for learning ERP waveforms and com-ponents.Section 3 provides a simulation study, showing the estimation performance of the proposed method against LOESS-based approaches, and Section 4 illustrates the application of our methodology to an ERP data set.Section 5 presents our conclusions.

Methods
In this section, we describe the proposed general SLAM framework, for the analysis of multi-subject-multi-group ERP data.To ease exposition, we consider a one-way ANOVA setting, and will use age ("old" and "young") as demonstrative example to illustrate our approach.We note, however, that the model construction can be easily generalized to incorporate any given number of factors and continuous covariates.

Derivative-Constrained Gaussian Process
Let y igs be the observation at time x i ∈ X , for i = 1, . . ., n, with g = 1, 2, . . ., G indicating the levels of a factor, and with s = 1, . . ., S g indexing the subject within the gth group.We assume the following model: (1) In ERP applications, f gs (•) represents the underlying true ERP waveform at level g of the factor, e.g., the ERP waveform for the older adult group.This waveform is usually believed to be a smooth curve.
We adopt the Bayesian modeling approach of Yu et al. [2023] and impose a derivativeconstrained Gaussian Process prior on the function f gs (•) [Ghosal andvan der Vaart, 2017, Rasmussen andWilliams, 2006].We assume that the function has M stationary points, t gs = {t m gs } M m=1 , each corresponding to the latency of an ERP component.The parameters t gs and their association with subject-level covariates are of interest here.We encode the stationary points t gs in the modeling of the ERP waveforms via a derivative Gaussian process (DGP) as where DGP(0, k(•, •; θ), t gs ) is a GP with mean 0 and covariance kernel k(•, •; θ) conditioned upon f ′ gs (t m gs ) = 0 for m = 1, . . ., M , which is also a GP with t gs -dependent mean and covariance functions.Here θ indicates the hyperparameters in the kernel.Gaussian process priors are widely used for modeling unknown functions, and the covariance kernel is instrumental in determining sample path properties.Yu et al. [2023] and Li et al. [2023] show that a general DGP forms a n-dimensional Gaussian distribution N µ gs , Σ gs (θ, t gs ) with mean vector where k 00 is the n×n covariance matrix of x, k 10 = k ′ 01 is the M ×n covariance matrix of x and t gs , and k 11 is the M × M covariance matrix of t gs .Following these authors, we use the squared exponential kernel, that is as the waveform in ERP studies is typically believed to be smooth.
In the absence of prior information on M , the number of stationary points, the method developed in Yu et al. [2023] and Li et al. [2023] Note that a m and b m can be specified in a factor-dependent manner.

Latent ANOVA Model
Unlike Yu et al. [2023], which focused on employing DGP to infer the unknown M components on a single ERP waveform, our goal is to model ERP components across multiple subjects and examine how multiple factors or covariates, such as gender and age, affect the ERP components' latency and amplitude.For this, we incorporate a flexible ANOVA construction into our model, that allows estimation of the ERP components at both the population and subject-specific levels.
For each level g, the subject-level parameter t m gs is governed by the general beta prior supported on [a m , b m ] The general beta distribution supported on [a, b] is distributed as (b − a)X + a for the standard beta variable X supported on [0, 1].The first two parameters in the general beta distribution are parametrized such that r m g ∈ (0, 1) is the location parameter and η m g is the scale parameter.The prior mean of Note that all subjects in the same level g have the m-th stationary point from the common level general beta distribution centered at r m g in its time window.Such hierarchy is a starting point of the ANOVA structure as the mean r m g is then further decomposed into the overall mean and the main effect from the factor.
The group-level parameter r m g indicates the relative location of the mean in the interval [a m , b m ], or the weight between the two end points of the search window.The larger r m g is, the closer the prior mean is to b m .The weight r m g can be easily transformed to have the same scale as t m gs and serves as the group-level latency that is common across subjects at the same factor levels.Given r m g , the value of η m g > 0 controls the prior variance of t m gs , which is Var . One can specify η m g using fixed values based on domain knowledge or preliminary estimates, or apply a common prior for within-group variability.
For inference on r m g , we introduce an ANOVA structure to model how the two factors affect the location of the ERP component.Since r m g is bounded between zero and one, we use a link function ϕ (•) ∈ (−∞, ∞) followed by an ANOVA: (5) Here we write the one-way ANOVA as a linear model using dummy variables {z a } G−1 a=1 where z a ∈ {0, 1} is a binary variable denoting the level of factor.Coefficient β m 0 denotes the grand mean at the baseline level of factor, and β m a quantify the effect of level a comparing to the baseline.This latent structure is flexible and inclusive in the following sense.First, it can accommodate any linear model with categorical and numerical variables.For example, subject-level numerical covariates (eg, blood pressure) can be included in the latent regression.Second, the latent structure is able to accommodate interactions between factors.Moreover, any link function for parameter values restricted in (0, 1) can be used in the modeling setup, such as logit, probit, and complementary log-log links.

Complete Model
We complete the model by placing standard normal priors on the beta coefficients in (5), a gamma prior on the positive variable η m g and standard inverse gamma priors on σ 2 .In summary, the full Bayesian SLAM is summarized as follows.For time point i = 1, . . ., n, subject s = 1, . . ., S g , factor level g = 1, . . ., G, and stationary point index m = 1, . . ., M , With no prior knowledge, we suggest setting hyperparameters µ m j , j = 0, 1 at zero, and (σ m j ) 2 , j = 0, 1 at one, i.e., standard Gaussian distribution.In our experience, the parameter learning is insensitive to the variance size.Furthermore, to circumvent overfitting, we choose hyperparameters in the inverse gamma prior on σ 2 so that its mean is around the empirical mean of the data.As for the hyperparameters τ 2 and h in the covariance function, we follow Yu et al. [2023] and estimate then based on the marginal maximum likelihood (MML) by embedding an EM algorithm into the posterior sampling, as explained in the section below, therefore avoiding a separate tuning of these hyperparameters.

Posterior Inference
We adopt the Monte Carlo expectation-maximization (MCEM) algorithm for joint parameter tuning and posterior sampling of Yu et al. [2023] to our SLAM framework.
More specifically, in the E-step, we sample the subject-level latency parameters, t m gs , the factor main effect for group-level latencies, β m 0 , β m a , the general beta scale parameters, η m g , and the noise variance, σ 2 , and optimize the hyperparameters τ 2 and h in the M-step.
Given the posterior samples of t and σ 2 drawn from the E-step, the hyperparameters τ 2 and h are updated in the M-step by maximizing the log marginal conditional likelihood: where t (j) l and (σ 2 l ) (j) are the l-th posterior sample of t and σ 2 at the j-th iteration in the algorithm respectively.Therefore, θ(j+1) is the maximum marginal likelihood (MML) estimates at (j + 1)-th iteration in the MCEM algorithm.The optimization step can be solved numerically using standard numerical optimization approaches.The detailed sampling scheme in the MC E-step is discussed in the Appendix, and the entire algorithm is summarized in Algorithm 1 below.
The MCEM reaches convergence when ∥ θ(j+1) − θ(j) ∥ < ϵ at the (j + 1)-th iteration [Bai and Ghosh, 2021].The threshold ϵ is set to be 10 −5 in the simulation and data analysis.With the sample size of 100, our R code takes about 70 seconds for J = 500 to finish one M-step, and 30 to 35 seconds for drawing 1000 samples in the MC E-step or final simulation.
Algorithm 1: Monte Carlo EM Algorithm for SLAM For each g and s, draw t (j) gs in blocks using Metropolis-Hasting steps on its conditional density p t gs | y gs , r g , σ 2 , θ(j) For each m, draw (β m 0 ) (j) using Metropolis-Hasting steps on its conditional density p (β m 0 | t, η) For each a and m, draw (β m a ) (j) using Metropolis-Hasting steps on its conditional density p (β m a | t, η) Obtain ϕ (β m 0 ) (j) , (β m a ) (j) and use the inverse link to transform ϕ to (r m g ) (j) For each g and m, draw (η m g ) (j) using Metropolis-Hasting steps on its conditional density p η m g | t m g , r m g Draw (σ 2 ) (j) from its inverse gamma conditional distribution Given samples of t and σ 2 at the j-th iteration, {(t m gs,l ) (j) } L l=1 and {(σ 2 l ) (j) } L l=1 , update θ to θ(j+1) by optimizing θ(j+1) according to eq. ( 6).Result: MML estimate of θ and posterior samples of t, β, η and σ 2 For multi-way ANOVA, for example with two factors, A with levels g = 1, . . ., G and B with levels h = 1, . . ., H, the latent ANOVA structure can be directly extended into where z 1,.∈ {0, 1} is a binary variable for factor A, and z 2,.∈ {0, 1} for factor B.
The posterior inference and learning algorithm are similar to the one-way ANOVA demonstration.

Simulation Study
In this section, we examine the performance of our proposed method through a simulation study, where we also compare results with the LOESS-based method of Jeste et al. [2014].For latency inference, we also discuss the difference between SLAM and the DGP-MCEM method of Yu et al. [2023].

Data Generation
For a fair comparison of the proposed method with other competing methods, in this section we consider a simulation setting where the data are not generated directly from the proposed model.We show that our model estimates the latency location better than the other methods.Later in Section 3.5, we examine how well the model learns the parameters in a scenario where the data are generated from our model.We consider the case of one factor with two levels or groups, each having ten subjects and ERP waveforms.Depending on how smooth or noisy the data set is, the ten ERP signals could be treated as either waveforms of ten trials of a single subject or waveforms averaged across trials for ten subjects.The two groups have significantly different ERP patterns, while the individuals within groups behave similarly with little latency and/or amplitude shifts.
Both groups have two components in their waveforms.Note that in group 1 where data are generated from the sine functions, all the curves have the same amplitude size and only latency changes, accounting for individual differences.If zero is the baseline, the amplitude is two and negative two for the peak and dip for all the subjects.The latency for the dip or the first ERP component ranges from 0.19 to 0.29, and the latency for the peak or the second ERP component ranges from 0.69 to 0.79.The curves in the second group, on the other hand, have their own latency and amplitude.For the dip, its latency ranges from 0.23 to 0.37, and its amplitude changes from -1.57 to -2.For the peak, its latency ranges from 0.57 to 0.71, and its amplitude changes from -0.83 to -1.26.For each curve, 100 observed data points are generated by adding mean zero Gaussian errors with a standard deviation 0.25.The simulation is set to mimic real ERP data.

Parameter Settings
We fit our novel SLAM with one-factor ANOVA to the entire data containing 20 ERP trajectories by setting M = 2 components in both groups, and assuming that one stationary point is in (0, 0.5), and the other in (0.5, 1).The initial values of the MCEM algorithm are specified as follows.The subject or individual level latency is randomly drawn from the uniform distribution, all the beta coefficients are set to zero and the parameters η and σ are set to one.In each E-step, the first 100 draws are considered burnin, and then further 2000 MCMC iterations, with no thinning, are saved for estimation, with 500 subsampled draws used for the subsequent M-step, to optimize the hyperparameters of the GP kernel.The iterative algorithm stops when the current updated parameters and the ones of the previous iteration have a difference less than 10 −5 .The final 20,000 MCMC samples are saved, and the posterior distribution of the stationary points is summarized based on the MCMC samples.To do the inference for group-level latency, the logit link function is used for transformation.When comparing with the LOESS-based method we the loess.wrapper()function in the R package bisoreg and the function PeakDetection() provided in the online supporting information of Hasenstab et al. [2015] at https://onlinelibrary.wiley.com/doi/10.1111/biom.12347 to each individual curve to estimate the underlying ERP waveform, latency and amplitude.The LOESS smoothing parameter was determined by 5-fold cross-validation from a sequence of values from 0.25 to 1 with increments 0.05.The dip searching window is set to be (0.01, 0.5), and the peak time window is (0.5, 0.99).The method provides point estimates with no uncertainty quantification.

Results on Latency Estimation
The latency estimates from SLAM, for all subjects, calculated as posterior means, are shown in Figure 2, together with the 95% credible intervals.For comparison, the LOESS-based estimates are also shown.Results show that, even though some of the LOESS-based estimates are closer to their corresponding true stationary points than the posterior mean estimates derived from our algorithm, there is more variability in the LOESS-based estimates, with more of the estimates away from the true values.This unstable phenomenon is more significant for group 2, with the cosine functions.This is confirmed by the root mean square error (RMSE), calculated as , as shown in Table 1, which reports RMSEs averaged over 100 replicates, with standard deviations in parentheses.Indeed, the proposed method outperforms the LOESS-based method in terms of estimation for individual data sets, as well as consistency from sample to sample.We note that the LOESS-based fit relies completelyon the sample data and requires a large sample to produce reliable estimates of the latency.In our experiment, when the simulation sample size is doubled to n = 200 the RMSE measures, averaged over 100 replicates, are 0.009 (0.0028) for Using the posterior median for point estimation does not change the values of the RMSEs much, due to the fact that the posterior distribution of stationary point is basically symmetric and unimodal with small variation, as shown, for one subject, in the top panel of Figure 3.
The middle panel of Figure 3 shows the posterior distribution of the group-level parameters r 1 and r 2 for the sine and cosine regression function settings.The 95% credible interval for r 1 and r 2 are specified by the red segments.The original LOESS-based methods do not account for group effects and, consequently, cannot provide inference on group-level parameters.To address this limitation, we employ a two-step adaptation as in Hasenstab et al. [2015], involving LOESS fitting followed by one-way ANOVA.The resulting 95% confidence intervals are represented by black segments in Figure 3.The confidence intervals appear to be narrower than the credible intervals.We also note that, while the sampling distribution of r 1 and r 2 is assumed to be Gaussian, the posterior distributions from SLAM are not necessarily Gaussian.

Method
Group 1: Sine Group 2: Cosine SLAM-posterior mean 0.0037 (0.0000) 0.0068 (0.0002) SLAM-median 0.0036 (0.0001) 0.0068 (0.0002) LOESS 0.0119 (0.0021) 0.0251 (0.0036) Amplitude Method Group 1: Sine Group 2: Cosine SLAM-posterior mean 0.0421 (0.0004) 0.0582 (0.0016) SLAM-median 0.0421 (0.0005) 0.0581 (0.0016) LOESS 0.0661 (0.0007) 0.0654 (0.0066)As for the comparison with the DGP-MCEM method of Yu et al. [2023], we found that the latency posterior means were almost identical to the SLAM estimates (results not shown).Also, on average, the SLAM and DGP-MCEM methods produced 95% uncertainty bands of similar length.However, the SLAM intervals exhibited less variation than the DGP-MCEM intervals, with standard deviations 0.002 and 0.007, for the sine group, and 0.005 and 0.006 for the cosine group, respectively.More importantly, the DGP-MCEM method does not provide group-level latency estimation and has no information about the group-level latency distribution.Indeed, in Yu et al. [2023] the DGP-MCEM model is fitted separately to each group of data.With this method, however, population-level estimates can be obtained, by fitting a Gaussian mixture and calculating the normal interval t ± 2sd( t), where t is the mean of the Gaussian mixture and sd( t) is the standard deviation of those means.We found that SLAM has a more precise estimation with a shorter interval than DGP-MCEM.Specifically, for the sine group, SLAM produces interval (0.19, 0.29) for r 1 and (0.69, 0.79) for r 2 , compared to the normal intervals (0.17, 0.31) and (0.67, 0.80).For the cosine group, SLAM produces (0.24, 0.34) for r 1 and (0.61, 0.71) for r 2 .The corresponding normal intervals are (0.21, 0.40) and (0.54, 0.74) for r 1 and r 2 , respectively.

Results on Amplitude
As for amplitude, we can derive estimates from the SLAM output as follows.First, for each sampled latency, given the data, a posterior path is drawn from the posterior derivative Gaussian process.Then, the fitted regression curve is obtained as the mean of those realizations.Examples, for one of the subjects, are shown in Figure 1.With traditional approaches for ERP data analysis, different heuristic methods are used to estimate amplitudes, for example as the amplitude value of the peak of the component or as the mean/area amplitude over a selected time window or search window, such as in the LOESS-based methods [Luck, 2005].Since our method is able to quantify possible locations of the ERP components, or latencies, we can essentially use this information to narrow down the search window by measuring amplitudes over the range of the posterior distribution of latency.Alternatively, an even shorter search interval could be used, by considering for example any (1 − α)100% credible interval of the distribution of latency.Since the search window is usually set wider so as to include all individual peaks of the same component, our latency intervals provide a more precise location of those peaks.Here, to compare with the LOESS-based method, we define a peak as the largest local extremum within the search window, that is, in our case, the range of the posterior samples of latency.The method is summarized in Algorithm 2. This method cares only about the largest extremum, and the shape of components does not matter.
Algorithm 2: Max Peak For stationary point samples {t (d) } D d=1 , and the curve realizations { f (d) (x)} D d=1 , 1. Decide the range of latency (a, b) of , i.e., a = min(t (1) , . . ., t (D) ) and b = max(t (1) , . . ., t (D) ) 2. For d = 1, 2, . . ., D: ( , where A m gs is the true amplitude value and Âm gs is its estimate, obtained as the posterior mean of our method.Table 1 compares SLAM and LOESS-based methods on the amplitude estimation.The SLAM method on average has smaller RMSEs as well as smaller standard deviations.One distinct advantage of our method is that it provides uncertainty quantification.The bottom panel of Figure 3 shows the amplitude distribution of the sine and cosine function, for one subject, in one simulated data using the Max Peak method in

ERP Data Analysis
We now perform an analysis of real ERP data on speech recognition, where we assess the effect of age on two ERP components of interest.We provide subject-and group-level estimates of latencies and amplitudes of these characteristic components, with uncertainty quantification, and discuss comparisons with the inference provided by the method of Yu et al. [2023].

Data
ERP data were collected from an experiment on speech recognition conducted at Rice University [Noe and Fischer-Baum, 2020] and analyzed in Yu et al. [2023].The experiment involved 18 college-age students and 11 older controls, as aging is known to cause difficulties in perceiving speech, especially in noisy environments [Peelle and Wingfield, 2016].The older subjects have ages ranging from 47 to 91 years old with a mean age of 67.78 years [Noe, 2022].EEG signals were recorded continuously during the speech task and standard preprocessing [Luck, 2005] was done using the ERPLAB toolbox [Lopez-Calderon and Luck, 2014] in EEGlab [Delorme and Makeig, 2004] before the data analysis.The experiment resulted in a total of 2304 trials.The top panel of Figure 5 displays the ERP waveforms for all subjects, averaged across all trials and six electrodes.
Common methods of analyzing ERP data involve averaging ERP waveforms over a particular condition and window of interest, to obtain the magnitudes and latencies of specific components.Specific ERP components are expected to be associated with speech perception (Tremblay et al., 2003).In particular, the N100 component, which captures phonological (syllable) representation, is of interest.This component is identified by the latency of the negative deflection in the time window [60,140] ms after the stimulus onset.The specific starting point of the N100 window is typically chosen by visually inspecting the grand average data, to obtain a window where the ERP curve has a negative value.In addition to the N100, we also considered the P200 component, in the window [140,220] msec, an auditory feature representing higher-order perceptual processing, modulated by attention.Recent studies have identified the P200 time window as potentially critical for processing higher order speech perception.
For posterior inference, we generated 21000 MCMC draws with 1000 burn-ins and thinned the chain by keeping every 10th draw, obtaining 2000 posterior samples.The final MCMC simulation in the MCEM algorithm was monitored and stopped when the potential scale reduction factor of every parameter was below 1.1, reaching approximate convergence.The full MCMC diagnostics are reported in the supplementary materials.

Inference on Latency
The bottom panel of Figure 5 illustrates the posterior density of the subject-level latency parameters in both young and older groups, based on 2000 posterior samples produced by the MC procedure in the E-step.We observe higher variability in the latency distribution for the older group, especially in the period after N100.Plots show substantial subject-level variability, as also noticed by Yu et al. [2023].Overall, most of the subjects show latencies nicely concentrated around the two components of interest, with a few showing larger uncertainty, and one (subject 11 in the older group) having a third smaller latency, suggesting a possible outlier.
Furthermore, we can obtain a probabilistic statement on whether the ERP peaks of the older group occur later than the peaks of the young group by calculating the posterior distribution of the difference in latency between groups.This is shown in Figure 6.In general, the peaks of N100 and P200 occur on average with a delay of 11.02 and 21.72 msec, respectively, in the older group.Also, the probability that the latency difference of N100 is greater than zero is 95.9% and that of P200 is 98.5%, suggesting significant latency differences between the two groups.These insights align with established results in aging theory, as these components rely on inhibitory connections to resolve perceptual objects and older adults are thought to have generally less inhibition and slower resolution of objects [Tremblay et al., 2003, Yi andFriedman, 2014].
Further probabilistic evidence on the latency shifting effect is provided by the distribution of r 2 − r 1 , see Figure 6.For example, for the young group, the probability that r 2 young -r 1 young is between 70 and 80 msec is about 56.4%, and that of the older group is about 41.9%.Notice that the time lag between N100 and P200 in the young group is shorter than the lag in the older group which has a higher chance of a time lag longer than 90 msec.

Inference on Magnitude
As with latency estimation, magnitude estimation can be performed at both group level and subject level.In particular, the posterior samples of the ERP curves can be used to quantify the uncertainty about amplitudes of N100 and P200 by integrating the estimated curve over the range of latency, say (a, b), and finding the time point or latency t such that the integral is at half of its full value.The 50% area latency method or the half-integral method shown in Algorithm 3 is more robust to noise, compared with the basic point estimate peak amplitude [Luck, 2012].In addition to Max Peak and Half-integral Peak, other methods such as measuring mean voltage over a given time window can be used to compute the amplitude using the posterior samples within our modeling framework.We note that it is common to use zero micro voltage as the reference, or baseline, to compute the amplitude.However, the positive peak at P200 of the older group is all below zero micro voltage.Therefore, to make the amplitude comparable, we use the peak of N100 as the baseline for the calculation of the amplitude of P200.Therefore, the amplitude shows the increase in units of micro voltage after N100, in response to the stimulus of the experiment.
Figure 6 shows that the amplitude size of N100 for the older group is significantly larger than the amplitude for the young group.We preserve the negative sign to indi-cate that the peak is below zero.For P200, the peak measurements are greater than zero since the baseline is at the peak of N100.The 95% credible interval for P200 Old -P200 Young includes zero, showing a weak significance of the P200 amplitude difference.Nevertheless, with about 80% probability, the P200 amplitude for the young group is larger than that of the older group.
We conclude by noting that, with respect to the method proposed in Yu et al. [2023], the hierarchical structure of SLAM allows learning the latency parameters at the subject and group level in a single model.Yu et al. [2023] estimate subject-level latencies only, and the group-level latencies are estimated separately by fitting a Gaussian mixture on the posterior samples of t, of all participants.With their approach, the 95% confidence intervals of the N100 group-level latencies are [85.38, 114.76] and [90.38, 126.67], for the young and older group, respectively.With SLAM, the 95% credible intervals for r 1 are [86.73, 104.45] and [98.08, 117.82] for the young and older group, respectively.The SLAM intervals are generally shorter and provide more precise interval estimation.

Discussion
In this paper, we have proposed SLAM, a novel Bayesian approach for the estimation of the amplitude and latency of ERP components.The novel SLAM is a unified framework and integrative approach that enhances the DGP model proposed in Yu et al. [2023] and offers comprehensive statistical inference about the parameters of interest in ERP studies.As for the method of Yu et al. [2023], SLAM estimates the uncertainty about latency and hence provides a data-driven model-based time window for measuring the magnitude of ERP components.While traditional methods provide a point estimate of magnitude, we further exploit our approach to obtain posterior samples of magnitude.Moreover, SLAM uses the fitted smooth ERP curve and therefore washes out noise that tends to affect the point estimate of a peak, especially when the Max Peak method is used.
In addition, SLAM incorporates a latent ANOVA structure that allows to examine how factors or covariates affect the magnitude and/or latency of ERP components, which is the main interest in ERP research as magnitude and latency reflect cognitive, psychological or neural processes.While traditional methods require a two-step approach or several methods to complete the statistical analysis, SLAM wraps up everything in one single model, producing inference by posterior samples of the group latency/magnitude difference.It examines not only the subjects' individual differences but also group-level differences that facilitate comparing different characteristics or factors, such as age in our illustration.Our results have shown that ERP components N100 and P200 are delayed in older adults, and that the N100 amplitude is generally larger for adults than for the younger group (with a negative sign).
Our model lays the foundation for more sophisticated statistical modeling for ERP analysis.As with generalized linear models and neural networks, SLAM can accommodate various valid link functions or activation functions with input domain (0, 1).Besides the logit function, which we have used in our analyses, other popular link functions for a variable between zero and one include the probit and complementary log-log (cloglog) functions.Both logit and probit are symmetric functions, and cloglog is asymmetrical.In our explorations on simulated data, we find that SLAM is robust to the choice of link functions, as different link functions produced nearly the same posterior distribution of the parameters, leading to similar RMSE of latency and amplitude, as well as to similar credible intervals.This robustness adds to the flexibility of our model.When the interest is on how latency changes with factor levels, that is the β coefficients, we recommend using the logit function for ease of interpretation, as the coefficients are related to the change in log odds.Furthermore, although ERP studies focus on how categorical factors affect amplitude and latency, numerical covariates such as blood pressure and weight can be included in our latent regression hierarchy.Other extensions of the proposed method could include treatments of other noise distributions accounting, for example, for autoregressive correlation or trial variability.Spatial modeling and mapping on multiple electrodes could be other possible extensions.

Data Availability Statement
Behavioral and Event-Related Potentials data that support the findings in this paper are available on PsyArxiv [Noe and Fischer-Baum, 2020] at https://osf.io/c7k4s/(DOI 10.17605/OSF.IO/C7K4S).

Statements and Declarations
The authors have no relevant financial or non-financial interests to disclose.The authors have no competing interests to declare that are relevant to the content of this article.All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.The authors have no financial or proprietary interests in any material discussed in this article.
Appendix: Monte Carlo sampling in the E-step of the MCEM algorithm In the Monte Carlo E-step of the proposed MCEM Algorithm 1 for SLAM, we simulate t gs , β m 0 , β m a , η m g and σ 2 using the following conditional distributions and Metropolis-Hasting steps.
• With the property that y gs are conditionally independent given t gs and θ(j) , the conditional density of t gs is p t gs | y gs , r g (β), σ 2 , θ(j) = p y gs | t gs , σ 2 , θ(j)

Figure 1 .
Figure 1.Simulation study.(Top) The ten simulated regression functions and corresponding curve fitting, for the first subject only, for group 1 with sine functions, and (Bottom) for group 2 with cosine functions.In the curve fitting plots, the light blue shaded area indicates the 95% credible interval for f (x), estimated from the sample paths { f (d) (x)} D d=1 drawn from the posterior derivative Gaussian process, and corresponding to the sampled latencies {t (d) } D d=1 .The mean of those realizations is shown as the red dashed line, the green dashed fitted curve is obtained from the LOESS method, and the true f (x) is shown in blue color.

Figure 2 .
Figure 2. Simulation study: (Top) Latency estimates for group 1 with sine functions and (Bottom) for group 2 with cosine functions.

Figure 3 .
Figure 3. Simulation study: (Top) Posterior histograms of t of the sine regression function for the 8th subject.(Middle) Posterior distribution of group-level parameters r 1 and r 2 for the sine and cosine regression function settings.The blue vertical dashed lines indicate true subject-level latencies.The red segments are the 95% credible intervals from the posterior distributions by SLAM, and the black segments are the 95% confidence intervals using the two-step approach by fitting LOESS followed by one-way ANOVA.(Bottom) Amplitude distributions using the Max Peak method in Algorithm 2. Vertical red lines indicate the true amplitudes, and the green lines indicate the LOESS estimates.
for a peak or z(d) := min f (d) (x) for a dip. 3. Collection {z (d) } Dd=1 forms posterior samples of amplitude.Result: Posterior samples of amplitude.Finally, to examine the two methods' performance, we use the same measure,

Figure 4 .
Figure 4. Simulation study: (Top) Examples of simulated regression functions and corresponding curve fitting, for the first subject only, for group 1 with sine functions, and (Bottom) for group 2 with cosine functions.There is a group-level stationary point in the region (0, 0.5) and in (0.5, 1).The black dashed line separates the two regions, and the red vertical lines indicate the true group-level latencies.The true f (x) is shown in blue color.Mean (SD) of posterior mean Mean (SD) of RMSE Parameter True value 10 subjects 20 subjects 10 subjects 20 subjects r 1 1 0.57 0.54 (0.033) 0.56 (0.027) 0.077 (0.016) 0.050 (0.012) r 2

Figure 5 .
Figure 5. Case study: (Top) Subject-level ERP curves and grand average ERP curves.Solid: Young group.Dashed: Older group.The 0 ms time, which corresponds to time point 100, is the start of the onset of sound.The N100 component of interest is the dip characterizing the signal in the time window [60-140] msec colored in green.The P200 component characterizes the time window [140-220] msec, colored in yellow.(Bottom) Posterior distribution of subject-level latencies.

Figure 6 .
Figure 6.Case study: (Top) Latency difference between groups and latency shift effect on the right.(Bottom) Amplitude difference between groups.

AAN
gs , Σ gs + σ 2 I M m=1 gbeta(t m gs | r m g (β 0 , β m 1 ), η m g , a m , b m ) and a Metropolis-Hastings (M-H) sampling is performed.The simple indepen-dent symmetric uniform proposal (t m gs ) * ind ∼ Unif(a m , b m) could be used.For more efficient sampling, we can also use the truncated normal distribution as the proposal distribution (t m gs ) * ∼ T N (t m gs ) (t−1) , (C m gs ) 2 , a m , b m with support (a m , b m ), where (t m gs ) (t−1) is the (t−1)th draw of t m gs in the E-step, and (C m gs ) 2 is the tuning variance that keeps the user-defined acceptance rate, 35% for example.•The conditional distribution of β m 0 Metropolis step is done with Gaussian proposal (β m 0 ) * ∼ N ((β m 0 ) (t−1) , C m 0 ), where (β m 0 ) (t−1)is the (t − 1)th draw of β m 0 in the E-step, and C m 0 is the tuning variance that keeps the acceptance rate at around 30%.•The conditional distribution of β m a is given byp (β m a | t, η) Metropolis step is done with the Gaussian proposal (β m a ) * ∼ N ((β m a ) (t−1) , C m 1 ), where (β m a ) (t−1) is the (t − 1)th draw of β m a in the E-step, and C m 1 is the tuning variance that keeps the acceptance rate at around 30%.•The conditional distribution of η m g is Ga η m g | α η , β η ,and a regular random walk proposal on the logarithm of η m gl is used for sampling.•A Gibbs step is done on σ 2 by drawing from its conditional densityp σ 2 | y, t, θ(jµ gs , Σ gs + σ 2 I   IG(σ 2 | α σ , β σ ),which is an inverse gamma distribution with shape parameter n gs A −1 gs y gs + β σ , where A gs = τ 2 0 k 00 (x, t gs ) − k 01 (x, t gs )k −1 11 (t gs , t gs )k 10 (t gs , x) + I and τ 2 = τ 2 0 σ 2 .

Table 1 .
Simulation study: Latency and amplitude comparison.Average and the standard deviations (in parentheses) of the RMSEs from the 100 replicated data sets.

Table 2 .
Simulation study: Parameter estimation.The mean and standard deviation of posterior mean and RMSE are computed from 50 replicates of data with size 100.