Bayes Factors for Testing Order Constraints on Variances of Dependent Outcomes

Abstract In statistical practice, researchers commonly focus on patterns in the means of multiple dependent outcomes while treating variances as nuisance parameters. However, in fact, there are often substantive reasons to expect certain patterns in the variances of dependent outcomes as well. For example, in a repeated measures study, one may expect the variance of the outcome to increase over time if the difference between subjects becomes more pronounced over time because the subjects respond differently to a given treatment. Such expectations can be formulated as order constrained hypotheses on the variances of the dependent outcomes. Currently, however, no methods exist for testing such hypotheses in a direct manner. To fill this gap, we develop a Bayes factor for this challenging testing problem. Our Bayes factor is based on the multivariate normal distribution with an unstructured covariance matrix, which is often used to model dependent outcomes. Order constrained hypotheses can then be formulated on the variances on the diagonal of the covariance matrix. To compute Bayes factors between multiple order constrained hypotheses, a prior distribution needs to be specified under every hypothesis to be tested. Here, we use the encompassing prior approach in which priors under order constrained hypotheses are truncations of the prior under the unconstrained hypothesis. The resulting Bayes factor is fully automatic in the sense that no subjective priors need to be specified by the user.


Introduction
In analyzing their data, researchers commonly focus on measures of central tendency such as means and regression coefficients. Measures such as variances that capture the heterogeneity of observations are usually regarded as nuisance parameters. However, to fully understand the patterns that are present in the data it is of vital importance to carefully model and interpret the variability of the observations (e.g., Carroll 2003). In fact, the heterogeneity of the observations can be a core aspect of a study (e.g., Hultsch, MacDonald, and Dixon 2002;Aunola et al. 2004;Lehre et al. 2009;Kaptein and Eckles 2012;Mulder and Fox 2013;Mulder 2016, 2018;Mulder and Fox 2019).
There are often reasons to expect certain patterns in the heterogeneity of multiple dependent outcomes. For example, in a repeated measures study that investigates the effect of a certain treatment, one may expect the variability of the outcome to increase over time if the difference between subjects becomes more pronounced over time because the subjects respond differently to the treatment (e.g., Aunola et al. 2004;Hedeker and Gibbons 2006;Böing-Messing et al. 2017). Such an expectation can be formulated as an order constrained hypothesis of the form H 1 : σ 2 1 < · · · < σ 2 p subject to the condition that the covariance matrix is positive definite, where p is the number of measurement occasions and σ 2 j is the variance of the jth measurement, for j = 1, . . . , p. of H 1 given by H 2 : ¬ σ 2 1 < · · · < σ 2 p , which may also be written as H 2 : ¬ H 1 in short. The complement entails all possible hypotheses on the p variances except H 1 under the condition that the covariance matrix is positive definite. In another example, Aunola et al. (2004) hypothesized that the variability of math performances either increases or decreases across grades. An increase might occur because children with high mathematical potential improve their performances faster than children with low mathematical potential. A possible reason for a decrease is that systematic instruction at school helps children with low mathematical potential catch up. These two competing expectations can be expressed as order constrained hypotheses H 1 : σ 2 1 < · · · < σ 2 p and H 2 : σ 2 p < · · · < σ 2 1 , where p is the number of grades and σ 2 j is the variance in grade j, for j = 1, . . . , p. Another conceivable competitor is the complement of H 1 and H 2 given by H 3 : ¬ σ 2 1 < · · · < σ 2 p ∨ σ 2 p < · · · < σ 2 1 . In this article, we use the multivariate normal distribution with unstructured p × p covariance matrix to model the dependent outcomes. We use an unstructured covariance matrix to safeguard against misfitting covariance structures that might distort inferences about the variances. As we will argue later, an unstructured covariance matrix is more suitable for testing specific patters of the variances in comparison to the implied, highly structured covariance matrix of a mixed effects model. We concern ourselves with testing T ≥ 2 order constrained hypotheses on the variances σ 2 = σ 2 1 , . . . , σ 2 p of the dependent outcomes located on the main diagonal of . In general, the hypotheses we test are of the form where t is an order constrained subspace of the unconstrained parameter space u = { : > 0} and > 0 means that is positive definite. In this article, we focus on hypotheses for which the subspace can be written as where R t is a q t × p matrix containing the coefficients for the q t order constraints on the variances under H t and 0 = (0, . . . , 0) is a q t -dimensional vector of zeroes. We test hypotheses for which each row of R t is a permutation of {−1, 1, 0, . . . , 0}. That is, we consider hypotheses with equal coefficients on the variances (e.g., H 1 : Another type of hypothesis we will consider in the context of repeated measures analysis is a formulation that states an order of the variances as well as an order of the ratios of adjacent variances (see, e.g., Mulder et al. 2009, for testing such patterns on repeated measures means). An example of this is the hypothesis H 2 : which states that the variance as well as the effect size σ 2 j /σ 2 j−1 increases over time. We will present an application of a test for such types of heteroscedasticity later on in the article when we apply our Bayes factor to an empirical dataset of repeated reading recognition scores in children.
Testing of variances has received much attention in the statistical literature. Numerous null hypothesis significance testing (NHST) procedures have been developed for different testing problems. Two prominent methods for testing equality of variances of independent populations are the classical likelihood ratio test based on the normal distribution and Bartlett's (1937) modification thereof. Robust alternatives to these tests were developed by Levene (1960), Brown and Forsythe (1974), and more recently by Shoemaker (2003). Noting that variances of independent populations often follow an increasing or decreasing order, Gastwirth, Gel, and Miao (2009) developed a NHST procedure for testing the null hypothesis of equal variances against an order-constrained alternative hypothesis. The problem of testing equality of variances of dependent outcomes has been considered as well. An early development is the Morgan-Pitman test (Morgan 1939;Pitman 1939) for testing equality of variances in the bivariate normal distribution. This test was modified by Wilcox (1990Wilcox ( , 2015 to improve on the robustness of the method. Cohen (1986) and Wilcox (1989) developed extensions for testing equality of variances of more than two dependent outcomes and Mentz (2001) proposed a likelihood ratio test of equality of multiple variances in the intraclass correlation model. To our knowledge, however, no method is currently available for testing multiple hypotheses with order constraints on variances of dependent outcomes as formulated in Equation (1).
In this article, we take a Bayesian approach to the testing problem in Equation (1) using the Bayes factor (Jeffreys 1961;Kass and Raftery 1995), which is a Bayesian hypothesis testing and model selection criterion. Bayes factors have been developed for a variety of testing problems frequently encountered in practice. For example, Gönen et al. (2005) proposed a Bayesian two-sample t-test that was later extended by Wang and Liu (2016). Klugkist, Laudy, and Hoijtink (2005) developed a Bayes factor for testing order constrained hypotheses on the means in analysis of (co)variance models. Mulder (2014b) proposed Bayes factors for testing order constrained hypotheses on means and regression coefficients in the multivariate normal linear model. Gu et al. (2014) developed an approximate Bayesian procedure for testing order constrained hypotheses on regression coefficients in a structural equation modeling framework. Fosdick and Raftery (2012) used the Bayes factor to test hypotheses on the correlation in bivariate normal data. Mulder (2016) proposed Bayes factors for testing order constrained hypotheses on correlations. Recently, Böing-Messing et al. (2017) and Böing-Messing and Mulder (2018) applied the Bayes factor to the problem of testing order constrained hypotheses on the variances of independent populations. In this article, we extend this methodology by developing a Bayes factor for testing variances of dependent outcomes.
The Bayes factor has a number of advantages that are not all shared by alternative hypothesis testing approaches such as NHST by means of p-values and model selection by means of information criteria such as the AIC (Akaike 1973) and the BIC (Schwarz 1978): First, unlike Bayes factors and the AIC and BIC, p-values are not able to simultaneously test multiple hypotheses and cannot quantify evidence in favor of a hypothesis. Instead, in the NHST framework one can only test every hypothesis of interest against a common null hypothesis and the p-value is then interpreted as quantifying the evidence against this null hypothesis (although this interpretation has been contested, see, e.g, Berger and Delampady 1987). This, however, does not give us an answer as to which hypothesis receives strongest support from the data. Second, contrary to Bayes factors and the BIC, the AIC is not consistent in the sense that it is not guaranteed to select the true hypothesis as the sample size goes to infinity (e.g., O'Hagan 1995). Third, Bayes factors have an inherent Occam's razor mechanism that automatically takes the parsimony introduced by order constraints into account (e.g., Kass and Raftery 1995). While p-values do not have an inherent mechanism for measuring the complexity of a hypothesis, both the AIC and the BIC measure the complexity of a hypothesis by the number of unknown parameters. However, this is not a suitable measure when testing order constrained hypotheses. For example, under both the order constrained hypothesis H 1 : ∈ 1 = : σ 2 1 < · · · < σ 2 p ∧ > 0 and the unconstrained hypothesis H u : ∈ u there are p unknown variance parameters. However, H 1 is more parsimonious than H u because the admissible parameter space under H 1 is smaller in the sense that it is a subset of the unconstrained parameter space under H u .
The remainder of this article is structured as follows. In the next section, we describe the multivariate normal model that is used throughout this article and we propose an unconstrained prior on the model parameters. We then develop a Bayes factor for testing order constrained hypotheses on the variances in this model using the encompassing prior approach. Following this, we examine four important aspects of the proposed Bayes factor: scale invariance, Bartlett's paradox, large sample consistency, and missing data. Subsequently, we illustrate the use of the Bayes factor by applying it to two empirical datasets about scholastic achievements of children. We conclude the article with a discussion of our approach.

Model and Unconstrained Prior
In this article, we concern ourselves with testing order constrained hypotheses on the variances in the multivariate normal model where n is the sample size, y i is a p-dimensional vector of dependent variables, μ i denotes the mean vector for observation i, and is an unstructured covariance matrix: We are interested in testing order constrained hypotheses on the variances on the main diagonal of , where the hypotheses are formulated according to Equation (1). The mean vector, which will be treated as a nuisance parameter in this article, is typically modeled as a linear combination of a given set of covariates. For example, where x i and X i are k × 1 and p × k matrices with covariates, respectively, and B and β are k × p and k × 1 matrices with regression coefficients. By writing the vectorization of the coefficients matrix in Equation (4) as β = vec(B), we can denote the vector of nuisance parameters under both models by β.
Note that the multivariate normal linear model is a generalization of commonly used models such as (multivariate) analysis of (co)variance and multivariate regression. Furthermore, note that the standard linear mixed model, where the random effects are integrated out, has a structured covariance matrix which depends on the within-subjects variance σ 2 and the random effects covariance matrix : where I p is the identity matrix of size p and the covariates for the random effects equal the covariates for the fixed effects, X i . As noted above, we use an unstructured covariance matrix instead to make sure that the model can flexibly pick up different kinds of (potentially unforeseen) covariance patterns. In Section 5.2, we will motivate this choice for the unstructured covariance matrix in a repeated measures context. A common choice for the unconstrained prior on the covariance matrix is a conjugate inverse Wishart prior: where > 0 is the p × p prior scale matrix and ν > p − 1 are the prior degrees of freedom. The two hyperparameters and ν allow a flexible specification of the prior. We will come back to the choice of the hyperparameter values in Section 3.2. Since β contains nuisance parameters that are common under all order constrained hypotheses, we may use the standard noninformative Jeffreys prior π N (β) = C, where C is an unspecified normalizing constant that cancels out in the computation of Bayes factors (e.g., Jeffreys 1961;O'Hagan 1995). The unconstrained joint prior on and β is then given by

The Bayes Factor
The Bayes factor for testing hypothesis H t against a competing hypothesis H t is defined as the ratio of the marginal likelihoods under the two hypotheses: where m t (Y) is the marginal likelihood for observed data Y under hypothesis H t with order constraints formulated according to Equation (1). The marginal likelihood is given by is the prior on β and under H t , and the integration region of is restricted to the order constrained parameter space t . The marginal likelihood can be interpreted as a weighted average likelihood in the parameter space that is admissible under H t , where the weights are given by the prior. In this way, the marginal likelihood quantifies how likely it is that the data were generated under H t . The Bayes factor, as a ratio of marginal likelihoods, then tells us whether it is more likely that the data were generated under H t or H t . If B tt > 1 (B tt < 1), then there is evidence in favor of H t (H t ). A Bayes factor of, say, 10 indicates that the evidence in favor of H t is 10 times as strong as the evidence in favor of H t (e.g., Jeffreys 1961).

Encompassing Prior Approach
From Equation (9), it can be seen that a prior π t (β, ) needs to be specified to compute the marginal likelihood under hypothesis H t . In this article, we use the encompassing prior approach (Berger and Mortera 1999;Klugkist, Laudy, and Hoijtink 2005) to specify priors under competing order constrained hypotheses on the variances in the multivariate normal model in Equation (2). The encompassing prior approach is a popular approach to computing Bayes factors between order constrained hypotheses. The starting point is the specification of an unconstrained (or encompassing) prior under the unconstrained hypothesis H u : ∈ u . Order constrained hypotheses are nested in the unconstrained hypothesis in the sense that t ⊂ u . We may therefore specify priors under order constrained hypotheses as truncations of the unconstrained prior in the respective constrained subspaces. In this way, the problem of specifying a prior under each order constrained hypothesis simplifies to specifying one unconstrained prior.
The unconstrained joint prior on β and was given in Equation (7). The prior under an order constrained hypothesis H t is formulated as a truncation of the unconstrained prior in the admissible parameter space t : where is the prior probability that the order constraints hold under H u , with π u ( ) as in Equation (6), and 1 t ( ) is an indicator function that equals 1 if ∈ t and 0 otherwise. The prior probability quantifies the complexity of an order constrained hypothesis relative to the unconstrained hypothesis, where a large prior probability indicates high complexity (Mulder, Hoijtink, and Klugkist 2010). In Equation (10), its reciprocal acts as a normalizing constant.
In this article, we consider the situation where prior information about the absolute magnitude of the (co)variances is absent, while prior information about the relative magnitude of the variances is available in the form of order constraints. Consequently, we specify a vague inverse Wishart prior under the unconstrained hypothesis. As an uninformative choice for the prior degrees of freedom we propose setting ν = p − 1 + 2ε with a small positive ε. This setting can be considered the matrix equivalent of the prior degrees of freedom in the most commonly used vague inverse gamma prior IG(ε, ε) for the variance in a univariate linear regression model. To determine an uninformative choice for the prior scale matrix, first note that the unconstrained conditional posterior π u ( |Y, β) follows an inverse Wishart distribution of the form under the multivariate normal linear model, where Y = (y 1 , . . . , y n ) and X = (x 1 , . . . , x n ) , and under the linear mixed model. Consequently, we propose setting the prior scale matrix to = 2εI p . This choice can be considered objective for two reasons: First, under this setting all possible orderings of the variances σ 2 1 < · · · < σ 2 p , . . . , σ 2 p < · · · < σ 2 1 are (approximately) equally likely a priori (we will come back to this in Section 4.2). As was argued by Mulder (2014a), such a specification is desirable when testing order constrained hypotheses and prior information is weak. Second, this prior scale matrix has a negligible effect on the posterior scale matrix, as can be seen from Equations (12) and (13). Thus, the posterior scale matrix is completely determined by the information in the data under both models, which implies objectivity.
Using the priors in Equations (7) and (10), it can be shown that the Bayes factor of an order constrained hypothesis H t against the unconstrained hypothesis H u is given by where is the posterior probability that the order constraints hold under H u . A proof is given in Appendix A. The posterior probability quantifies the fit of an order constrained hypothesis relative to the unconstrained hypothesis, where a large posterior probability indicates a good fit. From Equation (14), it can be seen that the proposed Bayes factor automatically functions as an Occam's razor by taking the fit and the complexity of order constrained hypotheses into account. The Bayes factor between two order constrained hypotheses H t and H t is given by Note that the posterior probability of an order constrained hypothesis H t can be computed using the Bayes factors against the unconstrained hypothesis according to where P(H t ) is the prior probability of H t . If prior information about the hypotheses is weak, then it is customary to specify equal prior probabilities P(H 1 ) = · · · = P(H T ) = 1/T. From Equations (11) and (14)- (17), it can be seen that the Bayes factors and posterior probabilities of the hypotheses only depend on the unconstrained prior and posterior of . In Appendix B, we describe a method for computing the Bayes factors.

Scale Invariance
Scale invariance implies that rescaling the data Y by multiplying all elements by a constant w should not change the evidence in favor of competing hypotheses as indicated by the Bayes factor because the relative variability remains unchanged. Our Bayes factor for testing variances is in fact scale invariant. For the multivariate normal linear model, this can be shown as follows: First, note that under this model the unconstrained marginal posterior of given the original data Y follows an W −1 is the residual sums of squares matrix withB = (X X) −1 X Y. For the rescaled data wY, the residual sums of squares matrix is w 2 S. The posterior scale matrix is then equal to 2εI p + w 2 S ≈ w 2 S. Since the prior scale matrix can be ignored, the posterior probability that the order constraints on the variances hold can be written as where in the first step we applied the transformation = w −2 . Furthermore, order constrained subspaces are considered such that ∈ t if w 2 ∈ t , which explains the second step. As a result, our Bayes factor is invariant to the scale of the data. The same argument can be used for the linear mixed model.

Bartlett's Paradox
Bartlett's paradox (Bartlett 1957;Lindley 1957;Jeffreys 1961) is the situation where a Bayes factor for testing a precise null hypothesis against an unconstrained alternative hypothesis favors the null regardless of the information in the data if the variance of the prior under the alternative hypothesis is specified sufficiently large. Typically, Bartlett's paradox is not an issue in the case of testing order constrained hypotheses (Klugkist and Hoijtink 2007;Mulder 2014a). To see this in our setup, we need to investigate whether the prior probability that the order constraints hold under H u , that is, the denominator of the Bayes factor expression in Equation (14), can be made arbitrarily small by specifying ε in the unconstrained prior W −1 p (2εI p , p − 1 + 2ε) small enough. When considering a specific ordering of the variances, say, σ 2 1 < · · · < σ 2 p , thorough numerical checks (setting ε = 0.1, 0.01, 0.001, etc.) revealed that the prior probability is virtually independent of the exact choice of ε and is extremely close to 1/p!. Although this finding satisfies our intuition (as there are p! possible orderings of p variances), it cannot be theoretically proven because the variances are not independent and the marginal prior for the variances does not have a known distribution. Also for other order constrained variance patterns the prior probability is virtually independent of the exact choice of a small ε. Furthermore, since the posterior scale matrix for depends on the sum of the prior scale 2εI p and the sums of squares in the data, the posterior probability that the order constraints hold is virtually independent of ε as well. Thus, we can conclude that the Bayes factor is practically independent of the exact choice of ε > 0 and Bartlett's paradox is not an issue for our order constrained tests on variances.

Large Sample Consistency
Bayes factors are large sample consistent under very general conditions, which means that the posterior probability of the true hypothesis tends to 1 as the sample size tends to infinity. For our Bayes factor, this important property can be explained as follows: Suppose that the true values of the variances lie in the admissible parameter space t of hypothesis H t . Then, as the sample size increases, the unconstrained posterior of concentrates in t , so that the probability P( ∈ t |Y, H u ) tends to 1. This implies that if H t with admissible parameter space t is a competing hypothesis that does not overlap with H t (in the sense that t ∩ t = ∅), then P( ∈ t |Y, H u ) tends to 0. From Equation (16), it can be seen that the Bayes factor of H t against H t tends to infinity in this case (note that the prior probabilities P( ∈ t |H u ) and P( ∈ t |H u ) are constants that do not depend on the data). As a result, the posterior probability of H t tends to 1, which implies that the Bayes factor in Equation (16) is consistent. The same argument applies in the case where there are multiple competing hypotheses that do not overlap with H t . To ensure that the Bayes factor behaves consistently it is advisable to specify the hypotheses such that they do not overlap and cover the entire unconstrained parameter space (in the sense that 1 ∪ · · · ∪ T = u ).

Hypothesis Testing in the Case of Missing Data
As shown by Hoijtink et al. (2019), missing observations that are missing at random can be handled via multiple imputation when computing Bayes factors. This imputation step can be incorporated in the posterior sampler in a natural manner in the Bayesian framework. Note that multiple imputation is the preferred method over listwise deletion, which results in a loss of information and possibly bias (Rubin 1987(Rubin , 1996. Below, we show how the Bayes factors for testing order constraints on variances of dependent outcomes are obtained in the case of missing observations. As was shown in Section 3.2, only the posterior probability that the order constraints hold under the unconstrained hypothesis needs to be computed to obtain the Bayes factors for the order constrained test. Therefore, we do not need separate missing data models under the different order constrained hypotheses but only one missing data model specified under the unconstrained hypothesis. Let us denote the missing data by Z and the observed data by Y. In the Bayesian framework, the missing data Z are naturally treated as unknown parameters just like the model parameters. Inferences are then based on the joint posterior distribution π u (β, , ζ , Z|Y) of the model parameters and the missing data, where ζ are additional parameters used for the missing data model. The marginal posterior distribution of can be obtained by integrating the joint posterior distribution over the model parameters β, ζ and the missing data Z. The posterior probability that the order constraints hold can then be computed as where (1) , . . . , (M) are posterior draws for the covariance matrix. Thus, we can approximate the posterior probability using a Gibbs sampling scheme in which we iteratively draw samples from the full conditional distributions of β, , ζ , and Z. This approach is also referred to as data augmentation because the observed data are augmented with simulated values from the posterior predictive distribution of the missing data Z to obtain a complete dataset (e.g., Tanner and Wong 1987;Gelman et al. 2013, chap. 18). When computing the posterior probability in Equation (19) this way, the uncertainty caused by the missing data is integrated out as is usual in a Gibbs sampler. Note that this is only allowed because the missing data are missing at random, which assumes that missingness is conditionally independent of the missing data given the observed data. In Section 5.2, we will present an analysis of an empirical dataset with missing values where we make use of the missing data approach described here.

Heteroscedasticity in Cross-Sectional Data: Peabody Individual Achievement Test
The Peabody Individual Achievement Test (PIAT; Dunn and Markwardt 1970) is a survey for measuring an individual's scholastic attainment in the areas of general information, reading, writing, and mathematics. As part of the 1979 National Longitudinal Survey of Youth (Baker et al. 1993), a total of 997 children completed three PIAT subtests at age 10: Mathematics, Reading Recognition, and Reading Comprehension. All subtests consisted of 84 multiple choice questions and the final score on each subtest was the number of correct answers. Our interest was in whether mathematics scores were less or more variable than reading scores. Consequently, we tested the following competing hypotheses on the variances of the subtests: where σ 2 M , σ 2 R , and σ 2 C denote the variance of the scores on the Mathematics, Reading Recognition, and Reading Comprehension subtests, respectively. Note that the comma between σ 2 R and σ 2 C means that there was no order constraint on the variances of the two reading subtests. We tested H 1 and H 2 against their complement H 3 to take into account that the data might support neither H 1 nor H 2 .
We fitted a multivariate normal model without covariates to the data from the three PIAT subtests. There were no missing values in the dataset. We specified the hyperparameters of the unconstrained prior on the covariance matrix as = 0.002 · I 3 and ν = 2.002. Subsequently, we performed a sequential analysis of the data, where we recomputed the Bayes factors and posterior probabilities of the hypotheses for every additional 10 observations (i.e., we conducted the hypothesis test on the first 10, 20, 30, . . . observations in the dataset). We did so to illustrate how the evidence in favor of the hypotheses changes as more data become available. In practice, such a sequential approach allows researchers to update the evidence in favor of the hypotheses during data collection. Depending on the amount of evidence at a given point in time, it can be decided whether more data are needed to draw a more decisive conclusion about which hypothesis is most likely to be true. Note that Bayes factors are well-suited for such a sequential analysis because they are not affected by Type I error inflation and hence multiple testing is not a problem (e.g., Berger and Mortera 1999).
The results of our sequential analysis are shown in Figures 1(a) and (b). To enhance readability, Figure 1(a) shows the logarithm of the Bayes factor, which is also referred to as the weight of evidence (WOE): WOE tt = log(B tt ). The line for WOE 12 is discontinued because for sample sizes greater than 600 the posterior probability that the order constraints of H 2 hold was numerically approximated as 0 (see Appendix B), which resulted in an infinite weight of evidence. Figure 1(b) shows the posterior probabilities of the hypotheses, which were computed assuming equal prior probabilities. From the weights of evidence and the posterior probabilities, it can be seen that H 2 received strongest support from the data for sample sizes up to and including 70. After that, H 1 received strongest support, and there was a trend for the evidence in favor of H 1 to increase as more data arrived. In Figure 1(b), the lines for H 1 and H 3 behave inversely for sample sizes greater than 200 because H 2 receives no support here so that an increase for one of H 1 and H 3 is necessarily accompanied by a decrease for the other. The results show that for the final sample size of 997, we can practically rule out H 2 and H 3 and conclude that mathematics scores were less variable than reading scores, as is stated by H 1 .

Heteroscedasticity in Repeated Measures Data: Reading Recognition
An important topic in repeated measures analysis is heteroscedasticity, which focuses on the variability of observations over time. It is often of interest whether a destabilizing effect is present, that is, whether the variability increases over time. This will also be the focus here. A popular model for analyzing repeated measures data that allows for heteroscedastic variances is the mixed effects model with a random intercept and a random slope for time (e.g., Hedeker and Gibbons 2006, chap. 4). The advantage of such a mixed model is that it is capable of separately modeling the variability at the within-subjects level and the betweensubjects level. On the other hand, the random slope model has some drawbacks for our purpose of testing order constrained hypotheses on the variances of dependent outcomes: The model implies a specific pattern of the variances and covariances in the marginal model, namely, = σ 2 I p + X i X i . In the case of the popular random intercept and random slope model for balanced data with four repeated measurements at time t = (t 1 , . . . , t 4 ) , the covariance matrix equals = σ 2 I 4 + 1 4 t 1 4 t = σ 2 I 4 + 1 4 1 4 ψ 2 1 + (1 4 t + t1 4 )ψ 12 + tt ψ 2 2 , where 1 4 = (1, 1, 1, 1) . Since t 1 < · · · < t 4 , the (co)variances follow a very specific pattern which highly depends on whether the time points are centered around 0 (as is typically the case). Also observe that the variability of the observations over time highly depends on whether ψ 2 2 is (approximately) zero or not. Testing such boundary values can be quite complex however (Mulder and Fox 2019). For this reason, we consider the unstructured covariance matrix as a flexible tool for testing competing patterns of heteroscedasticity in repeated measures data, as we show below. Note that this model assumes balanced data with equal measurement occasions across subjects. Further note that by testing variance patterns under an unstructured covariance matrix, the proposed method can also be used when building linear mixed models to check if specific variance patterns implied by a linear mixed model are actually supported by the data. We will address another interesting choice for the covariance structure in the conclusion of this article in Section 6.
The repeated measures application we present in this section is based on an empirical dataset discussed in Vermunt and Magidson (2005). The outcome of interest in this dataset was reading recognition in children as measured by the Reading Recognition subtest of the PIAT. The Reading Recognition The weight of evidence (where WOE tt = log(B tt )) and the posterior probabilities of the hypotheses (assuming equal prior probabilities), respectively, for the Peabody Individual Achievement Test data discussed in Section 5.1. The weights of evidence and posterior probabilities were computed on the first 10, 20, 30, . . . observations in the dataset. (c, d) The corresponding results for the reading recognition data discussed in Section 5.2. See the text for a description of the hypotheses that were tested in each of the two sequential analyses. In (a) and (c), the lines for WOE 12 and WOE 21 are interrupted due to numerical reasons (see text for details).
subtest measures a child's word recognition and pronunciation ability. The subtest was administered four times at two-year intervals to a sample of 405 children. We tested the following hypotheses on the variances of the repeated reading recognition measures and the ratios of adjacent variances: H 1 : increasing variances and increasing ratios of adjacent variances over time, These hypotheses can be expressed formally using order constraints on the variances and ratios of adjacent variances as follows: where σ 2 j is the variance of the jth measurement. Thus, H 1 , H 2 , and H 3 all state that the variability of reading recognition performances increased over time. Such an increase in variability is frequently observed in practice (e.g., Aunola et al. 2004;Böing-Messing et al. 2017;Hedeker and Gibbons 2006). Moreover, H 1 (H 2 ) states that the effect size σ 2 j /σ 2 j−1 increased (decreased) over time, whereas according to H 3 the effect size neither steadily increased nor decreased over time. Similar as in the previous section, we tested H 1 , H 2 , and H 3 against their complement H 4 to safeguard against the data supporting neither of the three hypotheses.
We fitted a multivariate normal model with a quadratic effect of (centered) time and an unstructured covariance matrix to the repeated reading recognition data: y i ∼ N 4 1 4 t t 2 β, , where t = (−1.5, −0.5, 0.5, 1.5) and t 2 = (2.25, 0.25, 0.25, 2.25) . The dataset contained missing values on the outcome: In total, 18.21% of the repeated reading recognition values were missing. Hence, we used the imputation approach described in Section 4.4 to compute the Bayes factors. Since the missingness only occurred on the outcome, there was no need to specify an extended missing data model. Instead, we used the observation model above to impute the missing values in the Gibbs sampler. Again, we specified the unconstrained prior on the covariance matrix to be uninformative by setting the hyperparameters equal to = 0.002 · I 4 and ν = 3.002. As in the previous section, we performed a sequential analysis of the data by testing the hypotheses on the first 10, 20, 30, . . . observations in the dataset. Figures 1(c) and (d) present the weights of evidence and the posterior probabilities of the hypotheses resulting from the sequential analysis, respectively. As before, the posterior probabilities were computed assuming equal prior probabilities of the hypotheses. In Figure 1(c), the line for WOE 21 is interrupted because for sample sizes of 60 and 180 the posterior probability that the order constraints of H 1 hold was approximated as 0. The results show that the evidence fluctuated for sample sizes up to 60, with H 2 , H 3 , and H 4 receiving strongest support from the data at some point. For sample sizes greater than 60, H 2 received strongest support except when the sample size was 220, for which H 3 was most supported. In fact, it can be seen that H 3 remained a viable competitor throughout the sequential analysis. The results indicate that the complement H 4 can be practically ruled out, whereas H 1 received weak support at the end of the sequential analysis. Based on these results, we may conclude that there was strong evidence that the variance increased over time. Moreover, while there was considerable evidence that the effect size (as measured by the ratio of adjacent variances) decreased over time (as stated by H 2 ), it cannot be ruled out that the effect size developed in a different way over time (as stated by H 3 and H 1 ).

Conclusion
In this article, we developed a Bayes factor for testing order constrained hypotheses on the variances of dependent outcomes. We applied the encompassing prior approach, in which priors under order constrained hypotheses are formulated as truncations of the prior under the unconstrained hypothesis. This approach to prior specification has two main advantages: First, the problem of specifying a prior under every order constrained hypothesis to be tested simplifies to specifying one unconstrained prior. This is a useful property in practice, since in our experience researchers find it difficult to formulate subjective priors on (co)variances, in particular under order constrained hypotheses. We used a conjugate inverse Wishart distribution with noninformative hyperparameters as the unconstrained prior on the covariance matrix. With this specification, the Bayes factor is fully automatic in the sense that there is no need to specify a subjective prior under any hypothesis. The second advantage of the encompassing prior approach is that Bayes factors between order constrained hypotheses can be computed straightforwardly by sampling from the prior and the posterior distribution of the covariance matrix and computing the proportion of draws that satisfy the order constraints. The Bayes factor is consistent in that it is guaranteed to select the true hypothesis as the sample size increases.
A natural extension of the testing problem we considered in this article is to enable equality constraints between the variances (e.g., H 1 : ∈ 1 = : σ 2 1 = σ 2 2 < σ 2 3 ∧ > 0 ). This testing problem entails two main challenges: First, one needs to ensure positive definiteness of the covariance matrix under equality and inequality constrained hypotheses. A useful method to achieve this could be the separation strategy of Barnard, McCulloch, and Meng (2000). In this approach, the covariance matrix is decomposed as = DCD, where D is a diagonal matrix of standard deviations and C is a positive definite correlation matrix. Equality and inequality constrained hypotheses could then be formulated on the standard deviations in D. Second, Bartlett's paradox will become a problem if priors under composite hypotheses are specified too vaguely. A possible solution to this problem is to let priors under composite hypotheses contain the information of a minimal experiment (e.g., Berger and Pericchi 1996;Mulder, Hoijtink, and de Leeuw 2012;Böing-Messing and Mulder 2018). This can be achieved by updating an uninformative improper prior with the information contained in the smallest possible subset of the sample data that results in a proper posterior distribution. The remaining part of the sample data is then used to compute the Bayes factors between the (in)equality constrained hypotheses under investigation. Based on our findings in this article, we expect that the Bayes factor is well-suited for the intricate problem of testing multiple hypotheses with equality and inequality constraints on the variances.
Another interesting direction for future research would be to test constraints on variances under different covariance structures. To get a better understanding of where the variability in the data comes from, the implied covariance matrix in the mixed effects model, = σ 2 I p + X i X i , can be made less restrictive by allowing the within-subjects variances to vary over time, that is, = diag σ 2 1 , . . . , σ 2 p + X i X i . This will provide insight into whether the variability changes as a result of within-subjects variability, as a result of between-subjects variability, or as a result of both (see also Mulder and Fox 2019 for a Bayes factor test on between-subjects variances). It is then important that the structured part of the covariance matrix, X i X i , fits the data well because a misfit would distort the estimates of the diagonal elements σ 2 1 , . . . , σ 2 p . The proposed Bayes factors can also be used when building other types of mixed effects models, structural equation models, or time series models, by testing whether specific order constrained patterns of the variance components are supported by the data.