Leveraging Observation Timing Variability to Understand Intervention Effects in Panel Studies: An Empirical Illustration and Simulation Study

Abstract To examine developmental processes, intervention effects, or both, longitudinal studies often aim to include measurement intervals that are equally spaced for all participants. In reality, however, this goal is hardly ever met. Although different approaches have been proposed to deal with this issue, few studies have investigated the potential benefits of individual variation in time intervals. In the present paper, we examine how continuous time dynamic models can be used to study nonexperimental intervention effects in longitudinal studies where measurement intervals vary between and within participants. We empirically illustrate this method by using panel data (N = 2,877) to study the effect of the transition from primary to secondary school on students’ motivation. Results of a simulation study also show that the precision and recovery of the estimate of the effect improves with individual variation in time intervals.


Introduction
Longitudinal large-scale assessments and panel studies often aim to include equally spaced intervals between measurement occasions.Particularly in the experimental and assessment literature, unequally spaced measurement intervals between observations or measurement waves are often viewed as a potential source of unwanted variance.As a result, standardization or balancing of temporal processes of measurement is frequently attempted (Liu, 2016;Verbeke et al., 2014).In practice, however, time intervals can and do vary between measurement waves, between individuals, and even within individuals, which is often a result of pragmatic constraints (e.g., the available number of test administrators or testing devices) during the process of data collection.If observation intervals in panel studies vary both within and between individuals, they are often referred to as "individually varying" (Voelkle & Oud, 2013).In the present study, we aim to show how these naturally occurring individually varying time intervals (IVTs) can actually be beneficial, by providing additional information about intervention effects from observational data.

The Alignment of Measurement Intervals in Panel Studies of Intervention Effects
Panel studies provide rich data for examining longitudinal target processes or intervention effects that occur at a certain time.Given the manifold research questions these studies may simultaneously address, observation timing is usually not designed around certain expected events, but instead occur at prespecified points in time (Cooksey, 2018;Hern an et al., 2009).Thus, if an intervention or event of any kind takes place, observations in panel studies may occur close to the onset of the event, or much further away in time.Consider the following example: Throughout schooling, children around the world face certain transitions, one of which is the transition from primary to secondary school.This transition has repeatedly been shown to have social, emotional, and academic consequences for children (Evans et al., 2018).Among other things, it can result in a long-lasting decrease in academic motivation and self-concept (Archambault et al., 2010;Chouinard et al., 2017).According to the stage-environment fit theory (Eccles et al., 1993), poor academic performance, mental health, or academic motivation after the transition may be caused by a mismatch between children's developmental demands at the time of the transition and the social context of secondary schools.When transitioning, children must adjust to larger classrooms and schools, more academic freedom, new peer and teacher interactions, and a greater focus on grades and performance.Due to these differences, students need to adapt to new academic expectations, norms, and evaluation criteria.Subsequently, this can negatively affect young adolescents' academic engagement and motivation, academic self-concept or competence, attitude toward school and learning, and their intrinsic interest in school (Chouinard et al., 2017;Dotterer et al., 2009;Eccles et al., 1993;Evans et al., 2018;Zeedyk et al., 2003).Dotterer et al. (2009) found that the steady decline in academic interest after the transition persisted until the age of 16.From there, it slowly reversed, but never fully recovered to its previous levels.A similar pattern was found for academic motivation (Gottfried et al., 2001).
From a statistical perspective, the transition from primary to secondary school can be considered an intervention that has an effect on all children of a certain cohort at the same time.For example, in Germany, this transition happens as early as after the 4 th grade.The German National Educational Panel Study (NEPS; Blossfeld et al., 2011) is the most important German educational large-scale panel study that can be used to examine the effect of the transition on students' motivation.In most cohorts, the NEPS samples students repeatedly once a year.The measurements in primary and secondary school usually take place from November to January, but measurements sometimes occur as late as March or April of a given grade (LIfBi, 2022), leading to between-student variability in measurement timing.Since students are not normally measured the same day or month in every grade, the students' individual measurement intervals across the yearly measurements will vary as well-sometimes it may be less than a year, sometimes more.Figure 1 presents this idea conceptually.In the upper panel, all students are surveyed at the same time in two consecutive waves, resulting in the same time intervals between their repeated measurement occasions.In the lower panel, students are measured in different months in the two consecutive waves, resulting in IVTs between the repeated observations and with respect to the onset of the intervention.Supplemental Figure A1 (OSM) represents the corresponding empirical example from the NEPS data.

Rationale behind the Potential Benefits of IVTs for Parameter Estimation
IVTs can be beneficial for understanding the course of intervention effects because they result in broader coverage over time.To illustrate this point, Figure 2 presents different distributions of measurements.It is assumed that students' academic motivation is measured once a year for 5 years (i.e., for a total of five measurements per student).After 21 months (t u ¼ 21), an intervention (e.g., the transition from primary to secondary school) occurs, causing academic motivation to shift to a new, lower level over time.
The upper panel represents a scenario in which the measurement occasions from such a study design are equally spaced (no IVTs are present), and every individual is measured at the same point in time.This practice results in a large number of individuals being measured in the same month each year but none of them being measured in any other of the 11 months of the year.Thus, if we model the trajectory of academic motivation across these measurement occasions, it will exclusively contain information about the level of academic motivation in, for example, October of Year 1, compared with October of Year 2, compared with October of Year 3, and so on.Although we might hypothesize what happened to academic motivation in all the other months of the year, we cannot know from this pattern of measurements.The middle panel shows a scenario in which IVTs are introduced.Here, IVTs between the five measurements of each student resulted in distributions (instead of single bars as in the upper panel) of measurements across the years.The majority of students are still being measured around the same months of a year, but there are also observations in the other months.Finally, the lower panel presents a scenario in which each student is still only measured once a year for 5 years (i.e., each student has five measurement occasions), but the IVTs are distributed in such a way that each month of each year is covered by the same number of measurement occasions.
What can be seen from Figure 2 is that IVTs allow us to cover a longer period of time with observations than if students were sampled with equally spaced intervals at the same time each year.Such a setup with IVTs might thus yield several advantages if we are interested in the trajectory of effects, especially nonexperimental intervention effects from panel data.First, by covering more months of the year with measurements due to IVTs, we might gain a better understanding of the (average) trajectories of academic motivation over time and across each year because measurements are taken in every month of the year as opposed to depicting the values from only a single month, for example.Second, IVTs can enable us to learn about the evolution of an effect across Figure 1.Repeated measurement occasions t 1 and t 2 of three student exemplars: A, B, and C. The dashed line at t u ¼12 represents the time point of an intervention (i.e., the month of the transition from primary to secondary school).In the upper panel, all students are surveyed at t 1 ¼ 2 and again at t 2 ¼ 14.In the lower panel, students A, B, and C are measured at different time points t at both measurement occasions.Therefore, the time-intervals (Dt) between the two measurement occasions vary between the individuals and cover different periods of time before the onset of the transition as well as afterwards.
many different time spans (e.g., 6, 10, 12, or 14 months).Instead of setting up a new study with each of these intervals, we can gain this knowledge all at once, given the appropriate statistical modeling opportunities.Third, when applied to intervention effects, IVTs might result in measurements that are closer to the onset of the event, as well as farther away than in a setting in which everyone is measured in the same month every year.Thus, IVTs and the resulting broader distributions of measurements across the year(s) can provide us with more information about the immediate, midterm, and long-term effects of an intervention.However, how can we ensure that this heterogeneity can benefit estimation and not result in biased parameter estimates?

Modeling Approaches to IVTs in Panel Studies
Many statistical models can handle variable timing in measurement and the resulting IVTs well.Latent growth curve Figure 2. Exemplary distributions of measurement occasions across 5 years plotted along with the same average trajectory of academic motivation over time.Time t is given in months, the dashed line at t u ¼ 21 marks an intervention, for example, the transition from primary to secondary school.In the upper panel, students are measured once a year at the same time (no IVTs).In the middle and lower panels, time intervals vary between students (IVTs present), resulting in different distributions of measurement occasions over time.From a visual inspection, it becomes apparent that the latter two scenarios result in a broader coverage of the time period before and after the transition.models based on structural equation models (LGCM; Bollen & Curran, 2006;McArdle et al., 2009;Sterba, 2014), for example, include time as an exogenous predictor for an outcome; that is, trajectories of outcome variables are modeled as a function of time.Because time is modeled as an exogenous variable, variable timing of measurement is not a problem.Importantly, however, while so-called "static" longitudinal models like the LGCM allow us to describe change over time, a causal interpretation of such models is not possible (Voelkle et al., 2018;Voelkle & Oud, 2015).Thus, when the goal is not only to describe change, but to understand the mechanisms that generate change, so-called "dynamic" models are better suited (Ryan et al., 2018;Voelkle et al., 2018;Zyphur et al., 2020).Typical dynamic models are autoregressive or change score models, where current states of a system (i.e., the dependent variable) depend on past states and external forces (see e.g., Boker & Nesselroade, 2002;Hasl et al., 2022;McArdle, 2009;Ryan et al., 2018;Voelkle & Oud, 2015).For example, a student's current level of academic motivation is assumed to depend on his or her past level of academic motivation, but is also influenced by other factors.From a dynamic systems perspective, an intervention such as the transition from primary to secondary school is an event that perturbs the academic motivation system from its normal equilibrium state (Bisconti et al., 2004;Boker & Nesselroade, 2002).
Importantly, discrete-time dynamic models only consider time implicitly by taking into account the order of the measurement, but not the exact time points or time intervals between them (Lohmann et al., 2022;Ryan et al., 2018;Voelkle et al., 2012).A first problem that arises from this is that it is not possible to compare results of studies to each other if they applied different time intervals to investigate the same substantive process (Hecht & Voelkle, 2021;Voelkle & Oud, 2013).If, for example, a study investigated the effect of the transition from primary to secondary school on academic motivation in intervals of 6 months, and found different effects than a study investigating the same effect with a 12 month interval, which study's estimate is the "correct" one?In such a setting, it is impossible to differentiate between the effect of the measurement intervals and the substantive process itself (Oud & Delsing, 2010; see also the work on optimal interval lengths and sampling rates in AR-type models: e.g., Adolf et al., 2021;Dormann & Griffin, 2015;Timmons & Preacher, 2015;von Oertzen & Boker, 2010).Second, similarly, if panel studies feature IVTs, and these are not accounted for in statistical analyses, it is difficult to interpret the target effect because it represents an average across the individual effects rather than the effect for a certain time interval.Fortunately, the literature identifies a solution to these problems: The application of continuous time (CT) models by means of stochastic differential equations (e.g., Adolf et al., 2021;Driver & Voelkle, 2018a, 2018b;Hecht et al., 2019;Hecht & Voelkle, 2021;Lohmann et al., 2022;Oravecz et al., 2018;Oud & Delsing, 2010;Ryan et al., 2018;Ryan & Hamaker, 2022;Voelkle et al., 2012).Among other things, CT models allow us to separate the measurement process from the process of interest by removing any potential confounding between the two when estimating parameters.

Research Objectives
Considering IVTs a nuisance may represent common practice or intuition rather than statistical necessity (Collins, 2006;Voelkle & Oud, 2013).When panel studies include IVTs, we can cover a larger space in time than we could with equally spaced measurement intervals.This may be helpful when examining longitudinal processes because we can learn more about the temporal course of a process or event.In this paper, we apply this line of thought to the estimation of intervention effects.Whereas previous research focused on how to use IVTs to study continuous processes, in the present paper, we add to this literature by demonstrating how CT models can leverage IVTs in panel studies for investigating intervention effects.Second, whereas earlier studies often drew on abstract examples, this study is guided by an empirical example, namely the transition from primary to secondary school and the NEPS data.Thus, although we will focus on simulations to evaluate whether and under which conditions individual variation in time intervals may improve the estimation and recovery of average intervention effects over time, their design and parameters are inspired by real data.
The article will proceed as follows: We will first introduce intervention effects from a dynamic systems perspective, and consider the handling of IVTs in a CT structural equation modeling framework.We will use the NEPS data as empirical example to illustrate how CT models can be used to capture intervention effects.Specifically, we study how students' academic motivation is affected by the transition from primary to secondary school.The parameters from this empirical model will serve as true effects in a subsequent simulation study, where IVTs will be distributed according to different normal and uniform distributions that are linked to the NEPS measurement schedule.Results will be presented and discussed with respect to the existing literature.Lastly, we will show possible limitations of our approach.

Modeling Input Effects in a Continuous Time Framework
Although developmental processes usually happen in continuous time t, their measurement occasions u are necessarily discrete.CT models by means of differential stochastic equations depict the rate of change of a process over infinitesimally small increments of time.This puts the generating mechanism on a continuous time scale and allows us to distinguish the underlying dynamics clearly from the discrete time measurement occasions u (Oud & Delsing, 2010).In the following, we will present CT models in terms of stochastic differential equations as provided in Driver and Voelkle (2018a).We will also define all parameters that are later used in the simulation study.The CT dynamic model is comprised of a latent dynamic model and a measurement model.CT parameters are obtained via structural equation modeling.A detailed step-by-step tutorial explaining each part of the model formulations of CT models can be found in Voelkle et al. (2012).

Latent Dynamic Model
The dynamic system is described by the linear stochastic differential equation: Vector g t ð Þ 2 R v represents the state of the latent processes at time t.In our example, g t ð Þ contains the level of the outcome variable (academic motivation) at time point t and the value of the time-varying intervention variable at the same time point.The matrix A 2 R vÂv denotes the drift matrix, with auto effects on the diagonal and possible cross effects on the off-diagonals characterizing the temporal dynamics of the process.Negative values for the auto effects are typical of nonexplosive processes and imply that as the latent state becomes more positive, a stronger negative influence on the expected change in the process occurs-in the absence of other influences, the process tends to revert to a baseline.For example, this indicates that, in the absence of an intervention, academic motivation would return to the same "baseline" level over and over again when randomly perturbed.The continuous time intercept vector b 2 R v provides a constant fixed input to the latent processes g: In combination with A, this determines the long-term level around which the processes fluctuate.In our example, the continuous time intercept b represents the average level of academic motivation over time (before the intervention).Without the continuous time intercept, the process would simply fluctuate around zero.
Time-dependent predictors v t ð Þ represent exogenous inputs to the system (e.g., interventions, in our case the transition from primary to secondary school) that may vary over time and are independent of earlier fluctuations in the system.Equation 2 shows a generalized form for time-dependent predictors that could be treated a variety of ways depending on the predictors' assumed time course or shape.We use a basic impulse form (Driver & Voelkle, 2018a, p. 82), in which the predictors are treated as impacting the processes only at a single moment (observation occasion u).The virtue of this form is that many alternative shapes are made possible via augmentation of the system state matrices.
Here, time-dependent predictors x u 2 R l are observed at measurement occasions u 2 U, where U is the set of measurement occasions from 1 to the number of measurement occasions, with u ¼ 1 treated as occurring at t ¼ 0: The Dirac delta function d t À t u ð Þ is a generalized function that is 1 at 0 and 0 elsewhere, yet has an integral of 1, when 0 is in the range of integration.It is useful to model an impulse to a system, and here is scaled by the vector of time-dependent predictors x u : The effect of these impulses on processes g t ð Þ is then M 2 R vÂl : Put simply, the equation means that when a time-dependent predictor is observed at occasion u, the system processes spike upwards or downwards by Mx u : In our example, if Mx u is negative, it means that the transition from primary to secondary school has a negative effect on students' academic motivation.
WðtÞ 2 R v represents v-independent Wiener processes, with a Wiener process being a random walk in continuous time.dWðtÞ represents the stochastic error term, an infinitesimally small increment of the Wiener process.Lower triangular matrix G 2 R vÂv represents the effect of this noise on the change in g t ð Þ: Q, where Q ¼ GG T 2 R vÂv , depicts the variance-covariance matrix of this diffusion process in continuous time.Intuitively, one may think of dWðtÞ as random fluctuations and G as the effect of these fluctuations on the process of academic motivation.GdWðtÞ then simply represents unknown changes in the direction of g (i.e., the process of academic motivation), which are distributed according to a multivariate normal distribution with a continuous time covariance matrix Q: The matrix forms of the model equations are presented in Supplemental Figure A2 in the OSM.

Discrete Time Solution of a Latent Dynamic Model
To derive expectations for discretely sampled data, Equation 1 may be solved and translated to a discrete time representation, for any observation u 2 U: The Ã notation is used to indicate a term that is the discrete time equivalent of the original for the time interval Dt u (which is the time at u minus the time at u À 1).A Ã Dt u contains the appropriate auto and cross regressions for the effect of latent processes g at measurement occasion u À 1 on g at measurement occasion u: b Ã Dt u represents the discrete-time intercept for measurement occasion u: Because M is conceptualized as the effect of instantaneous impulses x, its discrete time form matches the general continuous time formulation in Equation 1. n u is the zero mean random error term for the processes at occasion u, which is distributed according to multivariate normal with covariance Q Ã Dt u : The recursive nature of the solution means that at the first measurement occasion u ¼ 1, the system must be initialized in some way, with A Ã Dt u g uÀ1 replaced by g t 0 and Q Ã Dt u replaced by Q Ã t 0 : Unlike in a purely discrete-time model, where the various discrete-time effect matrices described above would be unchanging, in a continuous time model, the discrete-time matrices all depend on some function of the continuous time parameters and the time interval Dt u between observations u and u À 1; these functions are depicted as follows: where the asymptotic diffusion Q 1 ¼ irowðÀA À1 # rowðQÞÞ represents the latent process variance as t approaches infinity, A # ¼ A I þ I A, with denoting the Kronecker product, row is an operation that takes elements of a matrix row-wise and puts them in a column vector, and irow is the inverse of the row operation.

Measurement Model
While non-Gaussian generalizations are possible, in the present work, the latent process vector g t ð Þ has the linear measurement model: is the vector of manifest variables, K 2 R cÂv represents the factor loadings, and s 2 R c represents the manifest intercepts.The residual vector e 2 R c has covariance matrix H 2 R cÂc : The intercept and error variance of the observations are estimated because otherwise their expectation would be zero.Estimating the error variances also allows for some noise (error) that is unrelated to the process.

Translation into a Structural Equation Modeling (SEM) Framework
Importantly, the coefficients of the presented CT dynamic model can be estimated with Structural Equation Models (SEM).The structural model for obtaining continuous time parameters via SEM is as follows: Here, elements of the latent process vector g are related to each other via matrix B: The residual vector n has covariance matrix W: Given Equation 4 and 5, the model-implied covariance matrix R (e.g., Bollen, 1989, p. 325) can be derived for parameter estimation.

Sample
The data for the empirical example stemmed from the German National Educational Panel Study, Starting Cohort 2 (Blossfeld et al., 2011).We used data from Waves 5 to 9 (i.e., Grades 3-7; years 2014-2019), yielding observations for N ¼ 2,971 students over a period of 5 years and five measurements per student (one per grade).Because the transition from primary to secondary school in Berlin and Brandenburg takes place one year later than in all other German counties (Berlin/Brandenburg: Grades 5-6, all other counties: Grades 4-5), students from these counties were excluded from the analysis (N ¼ 94).Thus, the final sample was N ¼ 2,877 students.Most students are observed in November and December of a school year; the minimum realized spread of measurement occasions was November to January in Wave 5; the maximum realized spread was October to April in Waves 6 and 9. Students' motivation was assessed with the same four questionnaire items in each wave (e.g., "I try hard, even when tasks are difficult") with answers ranging from 1 (completely disagree) to 4 (completely agree).A sum score was calculated by adding up the answers.Reliabilities of the scale across waves ranged from Cronbach's a ¼ .46(Wave 5) to a ¼ .66(Wave 8; a ¼ .56 in Wave 6, a ¼ .64 in Wave 7, and a ¼ .63 in Wave 9).The individual (i.e., person) means for motivation across measurement waves before the transition ranged from 4 points to 16 points (M ¼ 13.518, SD ¼ 1.850), the withinperson standard deviations ranged from 0 points to 10 points (M ¼ 1.589, SD ¼ 1.427).

Model Specification
Drawing on previous substantive findings, we expected (a) a drop in academic motivation after the transition from primary to secondary school, that (b) persisted after the transition (up to Grade 7, where children were 12 years old) without reversing to its previous levels before the transition.In methodological terms, we thus specified a level change form for the intervention process in our example.Of course, other scenarios (e.g., a fade-out effect in the input) could be plausible as well, but for reasons of simplicity, we chose the level-change scenario in our present analyses.We did so by setting the intervention effect's drift parameter to zero (A 22 ¼ 0).To identify the input process's scale, A 12 was further fixed to 1 during estimation (see Driver & Voelkle, 2018a).All other parameters of interest, that is, A 11 , M 21 , s, H, and Q 11 , were estimated freely.Thereby, A 11 represents the drift coefficient for the process of academic motivation; M 21 represents the effect that the intervention process (i.e., the transition from primary to secondary school) has on academic motivation; s represents the manifest intercept (i.e., the average manifest level of academic motivation at t 0 ); H represents the corresponding error variance at t 0 ; and Q 11 refers to the variance of the diffusion process for academic motivation (see also Supplemental Figure A2 in the OSM).The specific time point of the transition was set to August 2016 (t 22 ), which marked the end of Grade 4.

Parameter Estimates
We used Maximum Likelihood estimation to estimate the model parameters.Empirical estimates revealed a negative auto effect in the drift matrix for the process of academic motivation (A 11 ¼ À0.300).This implies that in the absence of other influences, academic motivation would always revert to a baseline.The average manifest level of motivation at t 0 , that is, the manifest intercept s, is s ¼ 13.508.There were, however, substantial other influences, modeled by the additional latent intervention process of the transition from primary to secondary school, as well as by the system noise.The corresponding input effect M 21 ¼ À0.290 shows that the transition indeed had a negative (and, because of its level change form defined by A 22 , long-term) effect on students' academic motivation.The diffusion term Q 11 was estimated to be zero, which implied that in this special case, the model did not assume random fluctuations of academic motivation across the repeated measures.The CT parameters can be translated into a discrete time solution for any time interval Dt u of interest via exponential transformations (Equations 4-6).Figure 3 displays the model-implied average level of academic motivation across time as well as a random selection of students' empirical individual trajectories.
In the OSM, in Appendix A, we insert the empirical parameters in Equations 3, 4, and 7, and present step-bystep how to calculate a manifest academic motivation timeseries from the CT solution for an exemplary student.For discrete time intervals Dt u ¼ 12 months between measurements, the CT solution would translate into the discrete time autoregressive coefficient of AR(1) ¼ 0.027 of academic motivation, and a decrease in motivation due to the transition from primary to secondary school of À1.088 scale score points 1 year (12 months) after the transition.In comparison to the average between-person SD of 1.850 scale points for motivation across measurement waves prior to the transition and the average within-person SD prior to the transition of 1.589 scale points, this can be regarded as a considerable effect.Figure 4 depicts the CT function for the AR(1) parameters of academic motivation over time, that is, how autoregressive coefficients vary as a function of time intervals Dt u : Next, this empirical solution will serve as the true parameters for a simulation set-up.
The simulation study's objective is to evaluate whether and how individual variation in time intervals may benefit the estimation of average intervention effects over time.

A Simulation with Conditions Based on the Empirical Example
First, we simulated a CT dynamic model based on our empirical model.Second, we sampled discrete time observations from the CT model using seven different interval conditions.All interval conditions were chosen with potential real-life sampling interval decisions of panel studies, such as the NEPS, in mind.Third, we fit a CT model to 1,000 generated data sets under each condition.To obtain a more thorough picture, we generated data under two different sample sizes: one that corresponded to the NEPS data (N ¼ 2,877) and a small one (N ¼ 200) that represented a typical sample size for psychological studies.

Method
A continuous time model as defined in Equation 3 was simulated with drift matrix A ¼ ð À0:300 initial mean vector s ¼13.508, and initial error variance H ¼ 3.486 (Table 1).Data were generated for N ¼ 2,877 and N ¼ 200 students and five time points per each student (every student was measured once in every grade, resulting in a total of five measurement occasions per student; equivalent to the NEPS data example).The interval conditions were chosen such that each measurement point of an individual represented a measurement occasion in a specific grade.Thus, although the individual discrete time intervals sometimes vary dramatically, the compatibility with grade-wise assessment cycles is ensured.Just as in the original NEPS data, t 0 is set to November of Grade 3, when the first empirical observation of the time series took place.Each grade is considered ending in August of a given year, with the new grade starting in September.
Figure 5 represents the distributions of measurement occasions over time that resulted from each interval condition for N ¼ 2,877.Condition 1 serves as the "ideal" situation in which each student was measured every month across the five grades (i.e., corresponding to 54 months and measurement occasions per student).In all other conditions, each student had a total of five measurement occasions.Condition 2 represents the other "extreme" ideal standardization case in which everyone was measured once in each grade in the exact same month.In all remaining conditions, intervals were chosen to be individually varying, resulting in different distributions of the sample's measurement occasions across grades.Conditions 3, 4, and 5 were drawn from normal distributions centered at 12 months, with different standard deviations-0.5,0.75, and 2 months, respectively.In Conditions 6 and 7, intervals were sampled from uniform distributions, with Condition 6 possessing a lower limit of 10 months and an upper limit of 14 months between measurements (Dt i, j ¼ U $ ð10; 14Þ), and Condition 7 sampled so that students were randomly assigned a month for their measurement occasion in each of the five grades, Ugrade $ ð1; 12Þ: Furthermore, in Conditions 2-4, t 0 was the same for every student (t 0 ¼ 1Þ, in Conditions 5-7, t 0 varied between students.In Conditions 5 and 6, a normal distribution of t 0 around t 0 ¼ 1 (values varying between t 0 ¼ 0 and t 0 ¼ 2) was assumed in order to introduce higher initial levels of variation in time intervals in comparison with Conditions 2-4.
Ultimately, Conditions 1-7 allowed us to compare how different degrees of variation in time intervals succeed in recovering the true underlying (i.e., generating) CT process.R version 4.1.1(R Core Team, 2021) was used for data generation and the R package "ctsem" version 3.5.5 (Driver et al., 2017) for model estimation.

Evaluation Measures
We followed Schultzberg and Muth en (2018) and McNeish's (2019) recommendations for choosing the evaluation measures for the simulated data sets.
The relative bias is obtained by dividing the average simulated estimate (mean or median) by the true value.The relative bias thus refers to the relative bias of the mean or median (as estimators).A value of 1 implies that the point estimate is unbiased (i.e., the estimated value is equal to the population value).Values between 0.90 and 1.10 reflect negligibly biased and acceptable estimates, values < 0.90 are considered biased downward, and values above 1.10 are considered biased upward (McNeish, 2019).
The mean squared error (MSE) is a function of the bias and the standard error.For each parameter, the difference between the true value and the simulated estimate was squared and divided by the total number of replications.An MSE close to zero indicates that the bias and the standard error are small.The closer to zero, the higher the overall precision of the estimate (Schultzberg & Muth en, 2018).
The SE/SD ratio compares the empirical standard deviation (SD), that is, the standard deviation of point estimates across all replications, with the average of the standard error estimates over the replications.If the SE/SD ratio is close to 1, the SD and SE estimates are similar, and the SE estimate captures the true variability of the estimates.Values between 0.85 and 1.15 are considered acceptable estimates (Schultzberg & Muth en, 2018).
To check the 95% coverage, we estimated the 95% credible interval for each replication for each parameter.The measure represents the proportion of CIs across all 1,000 replications in which the population value was included in the interval.It is a measure of how well the estimated parameter's variability was estimated.A coverage of 95% is ideal, and values between 92.5% and 97.5% are considered acceptable (McNeish, 2019;Schultzberg & Muth en, 2018).
The power or the non-null detection rate provides the proportion of replications in which a non-null population effect is detected as being non-null, that is, the proportion of 95% credible intervals that do not include zero.No specific cut-off applies to acceptable nun-null detection rates, but higher values are better (McNeish, 2019;Schultzberg & Muth en, 2018).

Results
How does the (individual) variation in time intervals affect the recovery and precision of CT parameters?Tables 1 and  2 show the evaluation measures for auto effect A 11 and input effect M 21 for Conditions 1-7 and for N ¼ 2,877 and N ¼ 200 individuals, respectively.Results on the evaluation measures for all CT parameters can be found in the OSM, Supplemental Tables A1 and A2.Most importantly, on the basis of our results from Conditions 1 to 7, we can conclude that a certain amount of variation helps the estimation and recovery of average intervention effects over time.This conclusion held in both the larger (panel) sample of N ¼ 2,877 and the smaller sample of N ¼ 200 individuals.In our study, those conditions with higher individual variation in time intervals (Conditions 5-7) offered better recovery of the true auto effect and input parameters than those with lower variation (Conditions 2-4).
As expected, the control condition (Condition 1), in which every student was measured every month of the sampling period, yielded essentially perfect results for both sample sizes.Condition 2, with fixed intervals and only one measurement per year (Dt j ¼ 12), produced the most biased estimates for input and auto effects.In the N ¼ 2,877 condition, all evaluation measures lay far outside of the acceptable ranges, for both the precision of the point estimates (relative bias, MSE) and the recovery of the population parameter's variability (95% Cov., SE/SD).Similarly, although the power was considerably high at 0.868, it was the lowest across all conditions.The N ¼ 200 condition exhibited even more pronounced biases in point estimate precision.Whereas larger standard errors led to acceptable 95% coverage, the SE/SD ratio was unsatisfactory, like it was in the N ¼ 2,877 condition.Importantly, the SE/SD ratio for N ¼ 200 also remained unsatisfactory across all subsequent IVT conditions.As anticipated for smaller sample sizes, the statistical power was also low (around 0.4) and remained consistently low across all subsequent IVT conditions.
Introducing only a little variation (Dt i, j ¼ N $ ð12; 0:5Þ; i.e., 2 weeks in Condition 3) was still problematic with respect to MSE (N ¼ 2,877: A 11 : MSE¼ 4.  ) refer to the mean of the point estimates across 1,000 datasets; Estimate (median) and Rel.Bias (Median) refer to the median of the point estimates across 1,000 datasets.Measurement intervals are either constant between individuals (C1, C2), or individually varying (C3-C7).Bold font indicates a deviation from the acceptable ranges of the evaluation measures.Please find a complete list of results with diffusion parameters, manifest intercepts and measurement errors in the OSM (Table A1).(A 11 : 95% Coverage ¼ 0.959, SE/SD ¼ 0.965; M 21 : 95% Coverage ¼ 0.954, SE/SD ¼ 0.968).For N ¼ 200, Condition 5, which presented significantly more variation in time intervals than the previous conditions, was also the first condition in which a substantial drop in relative biases and the MSE occurred (e.g., A 11 : Rel.Bias (Median) ¼ 0.986, M 21 : Rel.Bias (Median) ¼ 0.993).
Both subsequent conditions (Conditions 6-7) with higher degrees of IVTs also performed well.Conditions 5 and 6 showed fairly similar results, likely due to their similar realized spread of measurement occasions across the whole sampling period (Figure 5).The last condition (Condition 7) took a different sampling approach, with students randomly assigned to one of twelve months per grade, resulting in an equal number of measurement occasions per month.The average input and auto effect were recovered as well as in the previous two conditions.For N ¼ 2,877, all evaluation measures were in the acceptable ranges.For N ¼ 200, most evaluation measures (except the relative bias of the median) continued to be out of the acceptable ranges but showed the best results achieved across all IVT conditions.For N ¼ 2,877, Condition 7 had the highest values for power across all conditions (A 11 : Power ¼ 0.994, M 21 : Power ¼ 0.994).
Figure 2, which was given in the Introduction, actually depicts the true average model-implied CT intervention effect and shows how well measurement occasions resulting from Conditions 2 (upper panel), 6 (middle panel), and 7 (lower panel) of the panel sample (N ¼ 2,877) might cover its evolution over time.It becomes apparent that Condition 2, with the fewest IVTs, can only capture fractions of the process over time, because it only takes one "snapshot" at exactly the same time each grade.On the other hand, Conditions 6 and 7 have considerably more individual variation in time intervals, and thus cover a broader range of time points across the measurement waves (grades).This might be one of the main drivers contributing to the improved accuracy in recovering the true process parameters.With a smaller N, the range of time points covered due to IVTs was still broader than with equally spaced time intervals.However, the range of time points and the number of individuals observed at each time point were much smaller due to the smaller sample size.As a result, the effect of IVTs might have been more pronounced with the larger than with the smaller sample.
Lastly, Table 3 presents the median of the point estimates (averages across 1,000 generated samples) of the CT parameters A 11 , M 21 , s, H, Q 11 , and their respective standard errors under each condition.It also presents discrete time estimates A Ã for time intervals of Dt u ¼ 12 months, and discrete time estimates for M at Dt u ¼ 12 months after the intervention.

Discussion
It was the goal of our study to show how IVTs can contribute to the estimation of average intervention effects over time.For a long time, longitudinal studies have aimed for equally spaced measurement intervals; however, especially in complex samples with many individuals, IVTs are the norm rather than the exception.Although IVTs are often perceived as harmful or unnecessary noise, we were able to demonstrate that we may benefit from IVTs in longitudinal research.To summarize, the results show some amount of individual variation within and between time intervals can improve the estimation of the average intervention effect, both with respect to point estimates and their sampling variability.Two tentative conclusions are as follows: First, the more individual variation within and between individuals across measurement waves, the more we can learn about the the population effect; Estimate (Mean) and Rel.Bias (Mean) refer to the mean of the point estimates across 1,000 datasets; Estimate (median) and Rel.Bias (Median) refer to the median of the point estimates across 1,000 datasets.Measurement intervals are either constant between individuals (C1, C2), or individually varying (C3-C7).Bold font indicates a deviation from the acceptable ranges of the evaluation measures.Please find a complete list of results with diffusion parameters, manifest intercepts and measurement errors in the OSM (Supplemental Table A1).
evolution of an intervention process across different time intervals (e.g., intervals of 12 months vs. 9-15 months).Second, naturally occurring (or planned) individual variation in time intervals can result in a greater realized spread of measurement occasions across the whole sampling period.Inferring from our results, especially in panel studies with a large N, such as the NEPS, the exact choice of IVT sampling distributions might be based on practical considerations.Parameter recovery was not solely dependent on a special IVT distribution (e.g., normal vs. uniform).One possibility may thus be that it is not a certain IVT distribution that matters, but rather that the whole sampling period is sufficiently covered with observations.This claim, however, needs to be investigated systematically in future studies.

When Do IVTs Provide the Greatest Impact?
We picked a persistent level change form for our input effect in this example, assuming that academic motivation diminishes following the transition from primary to secondary school and remains low thereafter.Of course, depending on how an intervention effect is projected to evolve over time, other shapes of input effects are feasible.For example, if a level change is expected after an intervention but no assumptions are made about whether the effect will last, an initial level change followed by a fade-out shape may be more appropriate.Abenavoli (2019) described such an effect when investigating early childhood education programs, which have been shown to produce immediate positive impacts on children's cognitive and social-emotional skills.Although participating children began kindergarten with more skills on average than their peers, their skills converged as children progressed through school.Other scenarios are also possible: Bisconti et al. (2004), for example, showed that damped linear oscillator processes could accurately characterize the grief process following the death of a spouse in elderly women.The widows' well-being was subject to frequent ups and downs, with an overall positive trend over time.Conceptually, the more complicated such input processes are, the more crucial it may be to incorporate IVTs when analyzing data.Figure 6 conceptually depicts a fade-out and an oscillating model for the measurement occasions of Condition 2 (no IVTs).
With a level change process like the one we studied, the effect of IVTs on average parameter estimates appeared to be rather small in absolute terms.Yet, even in our "simple" scenario, with no complex evolution of an intervention effect over time, the applied evaluation measures (relative bias, MSE, 95% coverage, SE/SD ratio, power) revealed that the amount of individual variation in time intervals made a difference.Thus, for more advanced cases of input effects, the simulation conditions in our study without (or with little) IVTs would likely not be able to capture the true input effect function, which would lead to even larger differences in parameter estimates (see e.g., Boker & Nesselroade, 2002;Voelkle & Oud, 2013).We thus expect our findings to be rather a "lower bound" for the benefit of integrating IVTs when modeling input effects using (panel) data.

IVTs and Estimation of Random Intervention Effects
Importantly, for reasons of simplicity, we assumed that an average intervention effect of the transition from primary to secondary school describes the population of students well.In many cases, however, individual differences, or individual reactions to the same intervention, might be the focus of interest.For instance, it might be the goal to identify students who are at risk of suffering a long-term loss in motivation due to a transition from school A to B, and at which point in time a well-placed intervention might help these children.Other examples could be that an intervention, such as a sudden lockdown, might affect students from different socio-economic backgrounds differently.Indeed, individual differences factors such as being female, having parents with higher academic interests, and SES were found to "buffer" the negative effect of transitioning from primary to secondary school (Dotterer et al., 2009;Evans et al., 2018).
To account for such differences, Driver and Voelkle (2018b) extended the CT structural equation modeling approach to accommodate random effects that depict that individuals may respond differently to a certain input process.It is now possible to estimate random effects distributions for all CT parameters, permitting individual variation in strength, persistence, and form of the process in question (e.g., academic motivation) and related intervention processes.With respect to the empirical example, this would imply that the transition from primary to secondary school could result in different input functions for the students, which is likely a more realistic assumption than the "average-fits-all" case.The same benefits of IVTs likely also apply to individual difference estimation, but future work should explicitly test this and examine the extent to which individual difference recovery may also be improved by leveraging IVTs.

Varying IVT Distributions, N, and Intervention Onsets in Future Simulation Studies
Comparing results across all simulation conditions, it became evident that the distributions of IVTs that resulted in a wide spread of measurement occasions across the year performed best with respect to evaluation measures.Although IVT distributions in Conditions 5, 6, and 7 differed in their original set-up-varying from a normal distribution to different kinds of uniform distributions, all of them performed equally well in recovering true parameter estimates and in their confidence intervals, as observed in the panel sample size condition (N ¼ 2,877).This is an important insight for the following reason: Every large-scale assessment or panel study has its own characteristics and sampling challenges.For example, in educational research, sampling often has to happen in pre-defined school hours.Also, it might not be an easy task to sample as many students during their summer vacation as it is during the school year.Luckily, based on our results, we can see that the exact IVT sampling distribution may be based on practical considerations, as long as the time period in question (e.g., a school year, that is, every month of a time period of 12 months from September in a given year to August next year) is sufficiently covered with observations.
A sampling approach that covers broader periods of time and allows for individual variation in time intervals across waves also makes sense if we do not exactly know when an event might occur.Importantly, our study found that IVTs might yield positive effects regardless of sample size.Whereas the effect was more pronounced in the large sample (which corresponded to the NEPS panel study size of N ¼ 2,877), it was still clearly visible in the small sample of N ¼ 200.Future studies interested in optimal design decisions could investigate how different IVT configurations interact with sample size, a system's complexity (e.g., coupling, mediation), or the shape of input effects (e.g., oscillation, different time scales).

IVTs and Patterns of Missingness
Importantly, IVTs in panel studies may result from different processes.Some individuals (e.g., adults in a panel survey) may select themselves to participate at an earlier or later time, while it may also be possible that the time of participation is set by some authority for all individuals (e.g., students in schools).Thus, in some cases, the IVTs may be considered to be dependent on some variables that are associated with the self-selection process (i.e., missing-not-atrandom [MNAR] or missing-at-random [MAR]; Grund et al., 2021), whereas some IVTs can be considered to result from a purely random process (missing-completely-at-random [MCAR]).One core assumption of the presented approach on IVTs is that the sampling of measurement occasions is "exogenous," that is, that any observations that are missing from an equidistant sampling scheme are MCAR or MAR.In our example, we assumed that The more complex such input processes become, the more helpful it may be to incorporate IVTs while analyzing data in order to recover the actual shape of the input effects.
variability in measurement occasions was due to practical sampling limitations, and therefore unrelated to aspects of the students.In extensive panel studies like the NEPS, it is usually not possible to measure all participants at the same time, which results in natural individual variation of time intervals across measurement waves.Importantly, however, if the M(C)AR assumptions do not hold, inferences from the presented IVT approach should be made with caution and take into account how self-selection processes may affect the results.

Nested (Intervention) Effects
In complex surveys and large-scale assessments, data are usually clustered within different levels.For example, in the NEPS study, students are nested within classes, which are nested within schools.There are two important points resulting from this hierarchical structure: First, students within schools or classes are usually more similar to each other than students between schools or classes with respect to cognitive, socioemotional, and socioeconomic characteristics (Brunner et al., 2018;Dalane & Marcotte, 2022).If this similarity is not statistically accounted for, standard errors of the corresponding parameter estimates are likely underestimated (Hox, 2010;Raudenbush & Bryk, 2002).One possible way to address this is to adjust standard errors by means of a cluster-bootstrap procedure or robust estimates.In the present empirical example, for reasons of simplicity, we did not account for clustering.Thus, its standard errors should be interpreted with caution.Importantly, however, this only affects the substantive interpretation of the empirical example, but does not affect any of the results of the simulation study on IVTs (where data have not been simulated to be clustered).Second, it might be of interest to decompose the effects on different hierarchical levels, in order to differentiate between effects on the school or student level.This is usually done by multilevel modeling.Thus, introducing the multilevel context to modeling input effects in continuous time could be a promising future research endeavor.Importantly, however, there are also different approaches for addressing this issue, especially when effects at the higher level are of interest.For example, Lohmann et al. ( 2022) ran a CT model on aggregated data at the country level.

Low Scale Reliabilities
Some scales from the NEPS data have been known to reveal low to medium reliabilities (e.g., Hawrot & Loos, 2021;Lockl et al., 2020;€ Omero gulları & Gl€ aser-Zikuda, 2021), which was also the case in our empirical example on academic motivation.Thus, the substantive results on the effect sizes of our CT model need to be considered with caution.Although the general direction of, for example, the intervention effect of the transition from primary to secondary school is in line with prior studies (e.g., Evans et al., 2018), the estimated effect size and its confidence intervals might not accurately represent the "true," substantive effect of the event.Solutions for dealing with such a problem in future substantive applications could be to account for the measurement error, for example, by specifying a measurement model with multiple indicators for academic motivation, or by applying a single-indicator approach with the measurement error fixed to a specific value.Alternatively, Lohmann et al. (2022) applied a plausible-value approach to address measurement error in the PISA data.They generated multiple plausible values for their construct of interest, which they then analyzed in a stepwise manner using ctsem.Because, in the current study, the empirical parameters were used only for illustrative reasons, their possible substantive bias did not affect the results of our study on IVTs.Nevertheless, it might be an interesting future research endeavor to evaluate how reliability issues are best tackled in CT models and in the context of IVTs.

Conclusion
Often, longitudinal studies aim for equal measurement intervals between observations.In practice, however, this is rarely achieved because of the reality of the measurement process, especially in the case of large-scale panel studies.In this article, it was our goal to examine how individually varying time intervals (IVTs) might benefit the estimation of intervention effects over time.We did so by introducing an empirical example of the German NEPS data, with the transition from primary to secondary school serving as a quasi-experimental intervention.In a subsequent simulation study on the basis of the empirical parameters, we drew on continuous time dynamic models to compare different degrees of individual variation in time intervals between measurement occasions.We found that some amount of individual variation within and between time intervals can improve estimation of the average intervention effect.Importantly, parameter recovery was not dependent on a special distribution of IVTs (e.g., normal vs. uniform) but rather that the time period in question was sufficiently covered with observations as a result of the individual variation.In short, we encourage learning from IVTs for the analyses of intervention effects.

Figure 3 .
Figure 3. Empirical trajectories of academic motivation of five exemplary students from the German National Educational Panel Study, Starting Cohort 2. The dashed line at t u ¼ 21 (corresponding to August 2016) represents the time point of the transition from primary to secondary school.The bold line represents the model-implied average expected values of academic motivation as a function of time.Academic motivation drops considerably after the transition and stays at a new lower level afterwards.

Figure 4 .
Figure 4. Autoregressive parameters of academic motivation as a function of the time interval between observations.The points on the CT function represent discrete time parameters for time intervals of Dt ¼ 1 month and Dt ¼ 12 months.They differ in absolute values due to different time intervals between measurements, but stem from the same underlying continuous time function.

M 21 :
95% Cov ¼ 0.920), and SE/SD (N ¼ 2,877: A 11 : SE/SD ¼ 0.020, M 21 : SE/SD ¼ 0.022; N ¼ 200: A 11 : SE/SD ¼ 0.083, M 21 : SE/SD ¼ 0.083), but showed an improvement with respect to the median of the point estimates across 1,000 data sets (N ¼ 2,877: A 11 : Rel.Bias (Median) ¼ 1.035; M 21 : Rel.Bias (Median) ¼ 1.034; N ¼ 200: Rel.Bias (Median) ¼ 1.415; M 21 : Rel.Bias (Median) ¼ 1.455).An increase of variation by only one additional week (SD ¼ 0.75; i.e., 3 weeks in Condition 4) led to an improvement in nearly all evaluation measures for both N ¼ 2,877 and N ¼ 200.In contrast to Conditions 2 and 3, Condition 4 was the first condition in which the MSE was substantially closer to zero for the large sample condition (N ¼ 2,877: A 11 : MSE ¼ 0.574, M 21 : MSE ¼ 0.437), indicating that the bias and the standard error were small.Interestingly, although the precision of the point estimates (relative bias, MSE) for drift and input parameters in the N ¼ 2,877 condition improved relatively quickly with the introduction of a little more variation in IVTs, it was not until Condition 5 (Dt i, j ¼ N $ ð12; 2)) that the variability of the true auto and input parameter estimates was recovered satisfactorily

Figure 5 .
Figure5.Distributions of measurement occasions resulting from simulation conditions 1-7.Time t is given in months.t u ¼ 0 represents November 2014, t u ¼ 60 represents November 2019.The dashed line at t u ¼ 21 represents the time point of the transition from primary to secondary school.Conditions 1 and 2 show fixed measurement intervals j between individuals, Conditions 3-7 display varying degrees of variation in time intervals for each individual i and each interval j across Grades 3-7.In conditions 2-7, each student is measured once each grade, resulting in five repeated measurement occasions (t u ¼ t 1 to t 5 ).The total N of 2,877 students per measurement occasion stays the same in conditions 2-7.

Figure 6 .
Figure 6.Exemplary fade-out and oscillating intervention processes under measurement occasions of condition 2 without individual variation in time intervals.The more complex such input processes become, the more helpful it may be to incorporate IVTs while analyzing data in order to recover the actual shape of the input effects.

Table 3 .
Median and SE for discrete and continuous time parameters of the Fixed Effect Level Change Model under Conditions 1-7.Results based on median of 1,000 generated datasets each.A Ã Dtu¼12 ¼ Discrete-time first level autoregressive coefficient with Dt u ¼ 12 months between measurement occasions; M Dtu¼12 ¼ Discrete-time intervention effect in scale points of academic motivation Dt u ¼ 12 months after the transition from primary to secondary school; Est: Estimate; SE: Standard error.Simulated measurement intervals are either constant between individuals (C1, C2), or individually varying (C3-C7).