A state-space approach to time-varying reduced-rank regression

Abstract We propose a new approach to reduced-rank regression that allows for time-variation in the regression coefficients. The Kalman filter based estimation allows for usage of standard methods and easy implementation of our procedure. The EM-algorithm ensures convergence to a local maximum of the likelihood. Our estimation approach in time-varying reduced-rank regression performs well in simulations, with amplified competitive advantage in time series that experience large structural changes. We illustrate the performance of our approach with a simulation study and two applications to stock index and Covid-19 case data.


Introduction
The simultaneous modeling and fitting of p time series when p is large is a challenging task as the number of parameters grows with p 2 and can often exceed the number of observations. Five main approaches to reduce the number of parameters in multivariate time series models have been proposed: (1) reduced-rank vector autoregressive (VAR) and multivariate autoregressive index models (MAI) models (Ahn and Reinsel, 1988;Carriero, Kapetanios, and Marcellino, 2016;Velu, Reinsel, and Wichern, 1986), where the parameter matrix is assumed to have a lower rank structure; (2) shrinkage Bayesian VAR with shrinking prior distributions for controlling the number of parameters (Ghosh, Khare, and Michailidis, 2019;Litterman, 1986;Sims and Zha, 1998); (3) sparse VAR estimation, where a low-dimensional analysis of groups of series is first carried out to identify rows and columns of parameter matrices that can be assumed to have zero values, as in Davis, Zang, and Zheng (2016); (4) regularized estimation, where a penalty term is included in the estimation objective function (see, e.g., Nicholson, Matteson, and Bien, 2017;Song and Bickel, 2011); and (5) the popular dynamic factor model (DFM), introduced by Geweke (1977) and Sargent and Sims (1977). Using the idea of risk diversification discussed in Chamberlain coefficients. In Sec. 4, we showcase the adaptability of our model by analyzing two data sets. The first consists of seven financial indices, which are considered during two time periods, when international markets experienced extreme stress: the Great Recession in 2008 and the Covid-19 crisis of 2020. Here, we fit vector error correction models (VECM) with time-varying cointegrating relations. As a second example, we consider the number of weekly Covid-19 cases in 12 European countries and fit a TVP-VAR(1) with a reduced-rank coefficient matrix. Section 5 contains our concluding remarks.

Time-varying reduced-rank regression
The general reduced-rank regression (RRR) model is given by y t ¼ Cx t þ Cu t þ e t , t ¼ 1, :::, T, ::: (1) where y t 2 R p is a p-dimensional target vector, x t 2 R q is a q-dimensional predictor, and C 2 R pÂq is a matrix with rankðCÞ ¼ d minðp, qÞ: The latter implies that C can be decomposed as the product of two rank d matrices a 2 R pÂd and b 2 R qÂd such that C ¼ ab 0 : The errors e t are normal white noise with covariance X: The vector u t 2 R k contains additional explanatory variables with unrestricted coefficient matrix C 2 R pÂk : Both x t and u t are observable. Throughout, we use boldfaced letters and symbols to denote vectors or matrices and non-boldfaced for scalars. A 0 denotes the transpose of a matrix A and I s stands for the ðs Â sÞ identity matrix. Velu, Reinsel, and Wichern (1986) were the first to use RRR in vector autoregressive (VAR) models. In a VARðmÞ model, the explanatory variables x t are lags of the target variable y t , x t ¼ ðy 0 tÀ1 , :::, y 0 tÀm Þ 0 2 R mp : While fitting full rank VAR models requires estimation of a large number of parameters if the number of time series p is large, RRR introduces sparsity in the relationship between the lagged variables and the response by assuming that the coefficient matrix C 2 R pÂmp has rank d ( p: Cointegration models are another popular application of RRR in the time series context. In the vector error correction model (VECM), the cointegrating relations of multiple non-stationary time series are represented by a reduced-rank matrix (see e.g. L€ utkepohl, 2005, Part II).
RRR may also be simply used as a tool for constructing approximate yet parsimonious regression models. Reinsel and Velu (1998) provide a comprehensive review of reduced-rank regression models and their applications in both i.i.d. and time-series settings. The principle of RRR underlies several methods in sufficient dimension reduction based on inverse regression (Bura and Cook, 2003). For example, parametric inverse regression (Bura and Cook, 2001) and principal fitted components (Cook and Forzani, 2008) rely on RRR methodology.
The most relevant interpretation to our development is of RRR as a factor model. Specifically, y t ¼ ab 0 x t þ e t can be interpreted as a factor model with factors f t ¼ b 0 x t and factor loadings a, with the distinguishing feature that the factors, for given b, are observable, as opposed to latent in "standard" factor models. The factors summarize all the information in x t that is necessary for approximating y t : If x t ¼ y tÀ1 , the model can be interpreted as a simple dynamic factor model, where the factors f t have a VAR(1) dynamic (Peña and Poncela, 2006).
We incorporate time variation in the coefficients of model (1) by writing, We assume that the rank d < minðp, qÞ of C t is fixed and we model the time dependence of C t as In all three, the component matrices in the decomposition are of full rank d. In (A) and (B), one of the two factors in the decomposition of C t is assumed to be constant. Hence, the timevariation is attributed to the other. This simplifies the interpretation and estimation of the resulting model.
In model (A), it is assumed that the relevant information contained in x t may be extracted by non time-varying linear combinations of the regressors x t : However, the coefficients of the regression of y t on b 0 x t may vary with time. In this case the row space of C t , i.e. the row space of b 0 , is independent of time. The case of the column space of C t being constant is modeled by (B) with an analogous interpretation. In model (C), both the column and the row space of C t are allowed to be time dependent.
Both matrices in the decompositions (A), (B) and (C) can only be identified up to invertible transformations. For model (C), the relation a t b 0 t ¼ a t H t H À1 t b 0 t holds for any invertible H t 2 R dÂd , so that each pair a t , b t can only be identified up to right multiplication with a nonsingular matrix.
In models (A) and (B), the transformation H t ¼ H has to be independent of time. For example, in model (A), Ensuring identifiability requires additional restrictions, such as fixing a d Â d dimensional submatrix of b to the identity matrix. We demonstrate at the end of this section that the estimates of the proposed algorithm are invariant to transformations as in (6).
In addition to the ambiguity of the component matrices introduced by the time-specific transformations H t , the observations y t in model (C) are a non-linear function of a t and b 0 t , rendering model fitting and parameter estimation non-trivial. This model cannot be represented as a linear state-space system and one would need to resort to non-linear filtering methods, such as the extended Kalman filter or particle filters, for estimation of the time-varying parameters a t and b t (see e.g. S€ arkk€ a, 2013). On the other hand, models (A) and (B) are sufficiently flexible to capture various types of time-variation. A possible interpretation of model (A) could be a factor model with time-varying factor loadings: The factor composition remains constant, even though the factors' influence on different response variables can vary over time. With model (B), we can, among other applications, model time-varying cointegration spaces in vector error correction models. Moreover, time-varying factor compositions are covered by this formulation. We focus on models (A) and (B) in the rest of the paper.

Model specification
The reduced-rank regression model (2) with time-varying parameters C t as defined in (3) and (4), can be written as where the errors are i.i.d. with Eðe t Þ ¼ 0 and Covðe t Þ ¼ X: Both are regression models, where some of the coefficients are time-dependent. These time-varying parameters are modeled as (latent) states in a state space system, as in, e.g. Hamilton (1994).
Both cases (A) and (B) can be treated similarly. We therefore outline the estimation procedure in detail only for model (A). The derivation for model (B) is analogous. In order to simplify the exposition, we let C ¼ 0: However, the inclusion of this term is straightforward.
In absence of specific prior knowledge of the dynamics of the time-varying parameters, we assume that they follow a random walk. The time-varying parameters, i.e. states, are assumed to evolve according to some density f that is centered at the previous state, as in, for example, Del Negro and Otrok (2008), Eickmeier, Lemke, and Marcellino (2015) and Yang and Bauwens (2018). A common approach is to let f be a multivariate normal density, vecða tþ1 Þ $ N ðvecða t Þ, RÞ: The number of parameters pdðpd þ 1Þ=2 in the state covariance matrix R is prohibitive for estimation, even for moderate sized models. This fact necessitates the imposition of further structure on R: As has been noted by Yang and Bauwens (2018), this structure should be compatible with the inherent non-identifiability of the factorization (6). For example, the widely used simplification of a diagonal R is not preserved under invertible transformations, since the transformed states a t H evolve as vecða tþ1 HÞ $ N ðvecða t HÞ, ðH 0 IÞRðH IÞÞ: In this case, the covariance matrix is non-diagonal in general.
To bypass this limitation and facilitate identifiability of the parameter matrices, Yang and Bauwens (2018) propose a non-linear state space model on the set of all semi-orthogonal matrices, the Stiefel manifold In their specification, the state matrices a t perform a random walk on the space (11) by requiring f in (9) to be the density of a matrix Langevin (or von Mises-Fisher) distribution. This distribution derives from restricting a specific matrix normal distribution to orthogonal matrices (see e.g. Chikuse, 2003, Ch. 2). Although Yang and Bauwens (2018) derive closed form solutions for the predicted and updated states, the resulting posterior likelihoods involve intractable normalizing constants that depend on the model's time-constant parameters. This renders parameter estimation hard, as reflected in the absence of an estimation procedure. The model cannot be fitted to observed data and filtering the states is only possible if the unknown parameters are given a priori.
We do not require the time-varying parameter matrices a t be semi-orthogonal, but allow them to vary freely in R pÂd , which results in more flexible parameter estimates. We maintain the multivariate normality assumption (10) and further assume that the covariance matrix R is separ- Furthermore, this structure reduces the number of parameters from pdðpd þ 1Þ=2 to at most ðpðp þ 1Þ þ dðd þ 1ÞÞ=2 À 1: The decomposition R ¼ R c R r corresponds to the assumption that the states follow a ðp Â dÞ-dimensional matrix normal distribution (Gupta and Nagar, 2000), which agrees with the interpretation of the time-varying coefficients as rank d-matrices.
A further reduction of the number of parameters that need to be estimated is achieved by setting R r ¼ I p , which further relieves the computational burden for large datasets. Due to this assumption the (changes of the) rows of a t are independent and identically distributed, so that we only need to estimate dðd þ 1Þ=2 parameters, a computationally non-prohibitive number when the rank of the matrix is small. The same applies to model (B). The limitation to this form of matrix normal distribution complies with the matrix Langevin distribution used in Yang and Bauwens (2018), as the latter derives from a matrix normal distribution with the above covariance structure (Chikuse, 2003).
The resulting state-space model for model (A) is thus given by where e t $ N ð0, XÞ and g t $ N ð0, R c I p Þ are independent i.i.d. sequences. Analogously, model with f t $ N ð0, I q R c Þ:

State estimation and model fitting
The regressors x t , u t , t ¼ 1, :::, T, are understood as functions of explanatory variables z t and lagged dependent variables. That is, We let F s denote the sigma-algebra generated by all the exogenous variables z t and the observed target variables up to time s, F s ¼ rðz 1 , :::, z T , y Àmþ1 , :::, y 0 , y 1 , :::, y s Þ: This setting includes as special cases VAR(m), VARX(m) and VECM(m) models, in which the predictors are either functions of the lagged variables, or of the exogenous variables, or both.
The time-varying parameters a t and b t in models (14), resp. (15), can be inferred using the Kalman filter. Again, we outline the procedure in detail only for model (A). Following Shumway and Stoffer (2017, Ch. 6), we use the following notation for the filtered and smoothed coefficient matrices and the corresponding covariances.
In addition, we assume the noise vectors are jointly normally distributed with and independent of F 0 and the initial state a 0 : Then, the conditional expectations and covariances in (17) can be computed by the usual Kalman filtering and smoothing recursions (Shumway and Stoffer, 2017, Ch. 6). For the initial state, we assume vecða 0 Þ j F 0 $ N ðvecða 0 Þ, W 0 Þ, 1 providing the starting values a 0 0 ¼ a 0 and P 0 0 ¼ W 0 for the Kalman filter and smoother. The estimated states and the corresponding covariance matrices in (17) depend on the initial values a 0 , W 0 and the unknown parameters in system (14). We denote this set by 2 1 Analogously, we let vecðb 0 The negative log-likelihood of Y ¼ fy 1 , :::, y T g conditional on F 0 is (up to constants) given by Gibson and Ninness, 2005;Shumway and Stoffer, 2017). The log-likelihood (20) is a highly non-linear function of the parameters (19). We bypass its direct minimization using the expectation-maximization (EM) algorithm. Optimization is based on the (conditional) joint likelihood of the data Y and the states A ¼ ða 0 , :::, a T Þ, denoted by L Y, A ðHÞ: Given a set of initial values H ð0Þ for the model parameters (19), the EM-algorithm consists of two steps that are iterated in turn until certain stopping criteria are met. E-Step: Note: For better readability the notation does not show dependence on the current parameter estimates H ðjÞ : In the update for b 0 , the subscript i : j, k : l is used to refer to a submatrix with rows i to j and columns k to l. unvec denotes the inverse vec operation, i.e. a ¼ vecðAÞ () A ¼ unvecðaÞ: Evaluate the conditional expectation of the joint log-likelihood given F T as a function of the parameters H, while assuming H ðjÞ are the "true" parameters: The E-step includes smoothing of the latent states given the current set of parameters and starting values. The expected log-likelihood (21) for model (A) is given by The smoothed states a T t and the covariance matrices P T t and P T t, tÀ1 in (23) depend on the current set of parameters H ðjÞ : However, for better readability, this dependence is not reflected in the notation.
For the M-step we use a simple iterative scheme, where we, in turn, compute the minimizer of the expected log-likelihood with respect to one of the parameters from (19) while keeping the others fixed. The minimizers for a 0 , W 0 and R c are independent of the other parameters. Inspection of (23) reveals that, for model (A), the updates for b, X and, if included, C, depend on each other. Nevertheless, it is straightforward to compute the minimizer for one of these parameters while keeping the other two fixed. The explicit formulas for the above minimizers are given in Table 1. The updates within each M-step are iterated until the changes of the parameter matrices are negligible, which is achieved fast in our simulations. The EM-algorithm terminates if either the change in the parameter estimates, or the relative change in the data log-likelihood (20) is negligible.
By construction, the EM-algorithm is guaranteed to increase the data likelihood in every step of the estimation procedure (see e.g. Casella and Berger, 2002, Thm. 7.2.20.). If it converges, it reaches a stationary point; i.e., a local maximum of the data likelihood.
The non-identifiability of the decompositions (A) and (B) in (6) does not interfere with the estimation algorithm in the sense that the estimates for the matrices C t and the predictions for y tþ1 do not depend on the chosen factorization. For example, consider model (A) and an arbitrary nonsingular matrix H and defineã t ¼ a t H andb ¼ bH À1 0 : Then, the corresponding statespace system is equivalent to the (original) state space system (14) if we transform the noise covariances and the starting values for the Kalman filter correspondingly. This means that the filtered and smoothed states and the corresponding covariances of these two systems are related to each other by the same transformation. That is,ã s t ¼ a s t H andP s t ¼ ðH 0 I p Þ P s t ðH I p Þ for s 2 ft À 1, t, Tg and all t ¼ 1, :::, T: This observation also carries over to the EM-updates in Table 1. For example, the relationshipsb H, andX ðjÞ ¼ X ðjÞ hold for the estimates of models (24) and (14).

Choice of starting values
The choice of starting values H ð0Þ used for numerical optimization of the likelihood can have a strong influence on the resulting model fit due to local optima. However, our EM-algorithm is robust to the choice of starting values in our experiments. We start with the initialization of the coefficient matrices a 0 and b for model (A), and respectively a and b 0 for model (B). We denote the starting values for both cases by a ð0Þ and b ð0Þ , and determine them by time-constant RRR, as follows. We let a ð0Þ ¼ a RR , and b ð0Þ ¼ b RR where a RR and b RR are minimizers of the criterion for any positive definite, symmetric matrix W. The solutions to (25) give a rank d-approximation of the ordinary least squares (OLS) regression matrix b C OLS (Reinsel and Velu, 1998, Ch. 2). A close look at the OLS estimate reveals why the above starting values are reasonable. Assuming that Eðe t j x 1 , :::, x T Þ ¼ 0, the conditional expectation of the OLS estimate given the predictors is This conditional expectation is a weighted average of the time-varying matrices C t : Thus, its reduced-rank decomposition yields "averages" of the time-varying coefficients a t (resp. b t ). In particular, for model (B), spanðC t Þ ¼ spanðaÞ and thus also spanðCÞ ¼ spanðaÞ % spanða RR Þ, hinting that a RR is a reasonable starting value for a: If pronounced time-variation is suspected, it may be advisable to obtain starting values from a reduced-rank regression that only uses the first s < T observations of the time series.
The initial value for the error covariance X is set to the residual covariance of the time-constant RRR, If the number of parameters is large compared to the sample size, it can be restricted to be diagonal or to be a multiple of the identity X ¼ r 2 e I p : The updates for the EM-algorithm can be adjusted accordingly. Simulations imply that assuming a diagonal covariance matrix does not necessarily decrease the accuracy of the estimation of the states and the prediction performance of the model. If we wish to include additional predictors u t , the starting values for C and correspondingly adjusted starting values for a, b and X can be obtained from reduced-rank regression with an additional full rank coefficient matrix as derived in (Reinsel and Velu, 1998, Ch. 3). In our simulations, the choice of W in (25) did not have a strong influence on the parameter estimates. We used either the identity I p or the least squares residual covariance.
The uncertainty about the initial states a 0 , resp. b 0 , is represented in the choice of the initial covariance matrix W 0 : The larger the covariance, the more flexibility and uncertainty about the initial state is allowed for.
Finally, there is no obvious choice for starting values of the state covariance matrix R c : Generally, the magnitude of the entries of R c governs the flexibility of the resulting state estimates-the basic paths of the states will be similar regardless of the magnitude of R c : Our simulation studies indicate that different starting values for R c only marginally influence the value of the likelihood and the residual variance at the time of convergence. Nevertheless, it is advisable to try different values and check for stability of the resulting estimates.

Rank selection
Rank selection for the TVRRR model can be achieved by information criteria. We use the Bayesian information criterion (BIC), first introduced by Schwarz (1978), and given by  (28) with a structural break of the coefficient matrix at t ¼ 50. The bottom panel shows the corresponding estimation error err Ct ¼ kC t À b C t k F =kC t k F for the three estimates considered: oracle RRR, time-constant RRR and TVRRR.
where b H denotes the maximum likelihood estimates (computed with the EM algorithm) and K d is the number of free parameters of the model. The estimate for the rank is the minimizer of this BIC criterion (27). For model (A), R c has dðd þ 1Þ=2 free parameters, and b has q Á d free parameters, so that we set We do not count the parameters in the matrices X and C, since the dimension of these matrices does not depend on d. We also do not count the parameters for the distribution of the initial state a 0 (resp. b 0 ) and W 0 : Another popular criterion for model selection is Akaike's information criterion (AIC; Akaike, 1973), where log ðTÞ in (27) is replaced by 2. However, it has been shown in similar, time-constant settings [e.g. Aznar and Salvador (2002) for the choice of the cointegrating rank in vector error correction models; Bai and Ng (2002) for the number of factors in dynamic factor models] that AIC results in inconsistent rank estimates as the penalty term needs to tend to infinity at a certain rate to achieve consistency. Our simulation studies indicate that AIC does not lead to consistent estimates in our model. Specifically, its performance deteriorates with increasing sample size.

Simulation studies
In this section, we illustrate the filter performance via a toy example, and demonstrate the flexibility and usefulness of TVRRR in different settings in a simulation study.

Illustration of filter performance
We generate data y t from a five-dimensional VAR(1)-model with rank d ¼ 2 coefficient matrix, where a 1 , a 2 and b are random orthonormal matrices. Model (28) Shift s 2 f1, 1:5, 2g Random walk A Matrices follow a random walk, i.e. We fit a TVRRR model using the estimation procedure in Sec. 2.2. The estimated coefficient matrices are compared to those obtained from time-constant RRR and "oracle" RRR. For the latter, we assume the location of the structural break is known and fit two separate RRR models. We assess the accuracy of the estimate with the normalized Frobenius norm of the estimation error, Its values are displayed in the bottom panel of Fig. 1. The TVRRR coefficient estimates (--) are very close to those of oracle RRR (___). At the structural break, the error jumps up rapidly, yet the coefficient estimates quickly recover and roughly line up with those of oracle RRR. As expected, time-constant RRR errors (-) are the largest as the method simply averages over the two matrices a 1 and a 2 (as can be seen from Eq. (26)). This toy example also indicates the EMalgorithm we use to fit TVRRR is robust to the choice of the starting values, which were obtained from RRR, thus far from the truth.

Simulation
The data for our simulation study are drawn from the model equations and The predictors x t 2 R q are specified as exogenous predictors, drawn from q independent MA(2) processes. The errors e t are normally distributed with mean zero and diagonal covariance matrix X: The diagonal entries of X are drawn from the uniform distribution on the interval ½0:5, 1:5: Starting values for the EM-algorithm are obtained from time-constant reduced-rank regression as described in Sec. 2.2.1. The weight matrix W in (25) is selected as the residual covariance from the ordinary least squares regression of y t on x t : The initial state covariance W 0 is set to W 0 ¼ 1000 Á I pd for model (A), and W 0 ¼ 1000 Á I qd for model (B). The large variances reflect the uncertainty about the initial values used for the Kalman filter. We also conducted experiments where the data follow a VAR(1) process with time-varying coefficients, i.e. we set x t ¼ y tÀ1 in (30). Due to the random walk generation of the coefficient matrices, the time series often diverge. For this reason, we only report results for exogenous x t : However, when the VAR time series remain bounded, our method performs well.
To study the ability of the proposed method to capture different types of time variation, we generate datasets under three different types of time-varying coefficients. They are displayed in Table 2. The notation a $ UStiefelðp, dÞ indicates that a is drawn from the uniform distribution on the Stiefel manifold V p, d (see (11)) as defined in Chikuse (2003). The fixed scenario was included in order to evaluate the robustness of TVRRR in the case of a true RRR model with time-invariant parameters. The deterministic transition produces a gradual change of the parameters from the starting to the terminal coefficient matrix. The random walk agrees with the modeling assumption that underlies our procedure and represents random fluctuations in the coefficients. The matrices R d depend on the state dimension d and are selected as multiples of the matrices at the bottom of Table 2.
The dimension of the target time series vector takes values p 2 f5, 10, 20g, the predictors are generated with dimension q 2 f5, 10, 20g, and the rank takes values d 2 f1, 2, 3g: Each time series has length T ¼ 200. The validation time series of length s ¼ 200 is generated by extrapolating the data generating processes in Table 2. For each combination of parameters, we draw 10 sets of coefficient matrices, and repeat the model fitting procedure 10 times, resulting in a total of 10 Á 10 ¼ 100 simulation runs per setting.
The in-sample MSE for model (A) in (14) is The corresponding out-of-sample MSE is obtained from one-step-ahead predictions for the validation data using the Kalman filter and the fitted parameter matrices, We assess the accuracy of the estimate with the normalized Frobenius distances of the estimated coefficient matrices in (29). For model (B), these criteria are defined in an analogous manner. Throughout the simulation study, the performance is juxtaposed to that of the regular reduced-rank regression reference model.

Results
We report results for p ¼ q ¼ 10: The respective tables for other pairs of p and q are comparable and can be found in the supplementary material.
First, we present results for the fixed data generating process. Table 3 reports the in-and outof-sample MSE in (31) and (32), respectively, and relative estimation errors (29) for RRR and TVRRR in models (A) and (B). Under this setting, simple RRR is more appropriate than TVRRR. This results in larger relative estimation errors err C t even though it seems that forecasting precision is only mildly affected. The in-sample MSE of TVRRR is smaller whereas the out-of-sample MSE is slightly larger than the respective MSEs of RRR in both models. TVRRR is more flexible than the time-constant RRR. As a consequence, it tends to slightly overfit the training data, resulting in a lower in-sample MSE and a higher out-of-sample MSE in the validation data.
Next, we report the simulation results for the deterministic transition of the coefficients as specified in Table 2. The shift (s) is set to 1 ("small"), 1.5 ("medium") and 2 ("large"). As can be seen in Table 4, this setting incurs a significant loss in estimation precision for RRR with increasing variation of the parameter matrices. While the in-sample fit is not much affected, out-of-sample prediction fails. In contrast, TVRRR is uniformly better than RRR. It has slightly worse out-of sample than in-sample MSE, while both remain roughly stable across all setting combinations. Moreover, the coefficient estimation accuracy of TVRRR improves as the shift between the starting and final coefficient matrices increases. This effect is probably partly due to the relative error criterion. The estimation accuracy of RRR stays roughly the same across all settings. The difference in performance between RRR and TVRRR can be explained by considering the errors of the coefficient matrix estimates over time in Fig. 2. RRR averages over the coefficient matrices (see also the discussion on the starting values, Eq. (26)), and thus the smallest distance  Table 6. Results of rank selection via BIC criterion for p ¼ q ¼ 10 and three different coefficient transitions, 10 Á 10 ¼ 100 repetitions each.
Deterministic Random walk Str. break Note: "¼" is the fraction of correct rank estimation, and "<" the fraction of underestimated ranks.
is achieved at approximately T=2: This estimate is far off for time points that are closer to the start and end of the time series, and deviate even further for the validation data set t ¼ 201, :::, 400: On the other hand, the estimation errors of TVRRR are consistently small throughout the time range. In particular, since the RRR estimates serve as starting values for the EM-algorithm, this further illustrates the robustness of TVRRR to "wrong" starting values. Table 5 displays the respective results for the random walk coefficient evolution in Table 2. Even for the low variation random walk, we observe the failure of RRR both in terms of MSE and of estimation accuracy. Both in-and out-of-sample MSEs are significantly larger than the corresponding values for TVRRR. As in the deterministic setting, TVRRR gains from increasing variation in the coefficients.

Rank selection
We assess the performance of the BIC criterion (27) for rank selection in both models (A) and (B) and three scenarios for time varying parameters. In addition to the deterministic transition (with s ¼ 1) and the random walk transition (with s ¼ 0.1), as defined in Table 2, we also consider a structural break setting similar to (28): For model (A), we draw two coefficient matrices a 1 , a 2 $ UStiefelðp, dÞ and let a t ¼ a 1 for t T 2 and a t ¼ a 2 for t > T 2 : For model (B) we proceed analogously. We look at time series of length T 2 f100, 200, 300g and the same sets of values for p, q and d as before. Again, we carry out 10 Á 10 ¼ 100 repetitions of each setting.
The simulation results indicate that the BIC criterion is not sensitive to the coefficient transitions we consider. Table 6 reports the fraction of simulation runs where the rank was correctly estimated in both models for p ¼ q ¼ 10: Results for other combinations of p and q are comparable and can be found in the supplement. BIC rarely overestimates the true rank. For smaller samples, it frequently underestimates the rank, but drastically improves as the sample size increases. Rank estimation seems to be the least successful for the random walk coefficient transition, where the tendency to underestimate the rank of the model is the strongest. However, the performance of the criterion also in this scenario markedly improves with increasing the length of the time series.

Data examples
We apply TVRRR and an adaptive version of RRR that we define in this section on two data sets. The first consists of seven main stock indices, and the second contains Covid-19 cases for 12 European countries.

Stock index data
We perform an out-of-sample forecasting experiment with stock index data. We retrieved closing prices for DAX Perfomance Index (DAX), Dow Jones Industrial Average (DJI), Hang Seng Index (HSI), Nikkei 225 (N225), NASDAQ Composite (NASDAQ), and S þ P 500 (S þ P 500) and TSEC Weighted Index (TWII) for two different time periods. 3 The first period ranges from January 15, 2008to May 18, 2009, and the second from June 05, 2019 to October 05, 2020. Both time series include abrupt changes in global economy. The first period contains the Great Recession and the second the Covid-19 pandemic. While it is thought that stock index data are cointegrated, the relationships of the time series might have changed during both crises, possibly due to markets recovering from the shocks at different rates.
We fit a lag-1 vector error correction model (VECM) with possibly time-varying cointegrating relations b 0 t y tÀ1 , Prior to the analysis, the stock index time series are logarithmized, centered and scaled so that the sample mean equals zero and the variance is equal to one in the training data set. Missing values are imputed using na_kalman from the imputeTS R-package (Moritz and Bartz-Beielstein, 2017). We fit the models to the first 150 observations of the time series, and, using the estimated coefficients, predict the following 200 observations. We compare the following three model fitting approaches: TVRRR, time-constant RRR and time-constant updated RRR. For TVRRR, we compute the one-step ahead predictions with the Kalman filter. However, for the estimation of the model parameters, we only use the training data. These predictions are compared to two versions of the time-constant VECM. For the first, we fit the model to the first 150 observations and obtain predictions for the following 200 data points using the estimated coefficients (time-constant RRR). The second, fairer comparison model is a VECM that is refitted with every new observation of the prediction period to include all the past information in the prediction (updated RRR).
The MSEs from in-sample and out-of-sample prediction of the time series are reported relative to benchmark ("naive") predictions obtained from the assumption that stock index data follow a random walk (Fama, 1965;Kendall, 1953) We start with the analysis of the 2008 data. The seven time series before and after scaling are displayed in the left top and bottom panels of Fig. 3. The structural break in September 2008 is evident in both, with the scaled data capturing it better. The decline of the prices was also accompanied by an increased volatility, which can be clearly seen in the plots of the returns (log first differences). The training period is January 15, 2008 to August 11, 2008. When fitting the TVRRR model, the rank selected by the BIC criterion is d ¼ 2. The EM-algorithm converges after 54 iterations, and the likelihood reaches a plateau fairly quickly.
The relative in-and out-of-sample MSEs for both versions of time-constant RRR and TVRRR are reported in the top panel of Table 7. The performance of RRR and TVRRR is comparable insample. However, we are more interested in the out-of-sample performance of the methods. We thus have applied the (modified) Diebold-Mariano test (Harvey, Leybourne, and Newbold, 1997) to check, whether there are significant differences in the forecasting performance of the naive method, updated RRR and TVRRR. For the 2008 data, TVRRR yields significantly better predictions than the naive prediction in four cases, and it beats updated RRR for three series. Conversely, the prediction performance of TVRRR is never significantly worse than that of the competitors. Its capability to adapt even to abrupt changes in the coefficient estimates while retaining estimation precision when the parameters are constant pays off: The coefficient estimates adapt to the structural break. As expected, RRR fails completely at prediction. The different behavior of the methods may also be seen in the plot of the d ¼ 2 estimated cointegrating relationships b 0 1t y tÀ1 and b 0 2t y tÀ1 in Fig. 4, where b 0 it denotes the ith row of b 0 t : The RRR components (top) show a trending behavior. In contrast, the cointegrating relationships inferred using TVRRR (middle) only vary around zero, but they clearly display the increased volatility during the crisis. Compare also the plot of the log first-differences.
We next analyze the 2020 data. The corresponding time series are displayed in Fig. 5. Here, the structural break is even more abrupt. Whereas the time series decreased slowly in 2008, the 2020 series exhibit a sudden drop around March 2020, with a fairly quick global recovery. The training period ranges from June 6, 2020 to December 30, 2020. When fitting TVRRR, the BIC criterion again selects d ¼ 2 as the order of the model. The EM-algorithm converges after 104 iterations.
The corresponding MSEs are displayed in the bottom panel of Table 7. We observe a similar pattern as for the 2008 data set as regards the in-sample performance. According to the Diebold-Mariano test, the forecasting performance of TVRRR is significantly better than that of the naive prediction in one of the cases, and it is significantly better than updated RRR in three cases. In contrast, the forecasting performance of TVRRR is never significantly worse than that of the naive predictions and of updated RRR. Hence, also for the 2020 data, TVRRR shows significant performance improvement.
The cointegrating relations b 0 t y tÀ1 exhibit a similar pattern as those for 2008 shown in Fig. 4. Therefore we do not include a corresponding figure for the 2020 data.

Covid-19 data
We analyze weekly Covid-19 cases per 100 000 capita for 12 European countries, namely Austria, Belgium, Czech Republic, France, Germany, Greece, Italy, Netherlands, Poland, Portugal, Sweden and Switzerland. 4 Due to their proximity and frequent border crossings, the data can be assumed to be strongly correlated. Nevertheless, the relationships of the time series are likely to have changed during the pandemic.
We fit a time-varying VAR(1) model with reduced rank coefficient matrix of type (A), We do not preprocess the data. We fit the model starting at the end of March 2020 until February 2021, and predict the remaining values of the time series up to mid-November 2021, using TVRRR and updated RRR, as described in Sec. 4.1. More precisely, for updated RRR we use a time-constant model for the estimation period which is then refitted with every new observation in the validation period. The data were roughly split in two halves. The training and prediction period consist of 50 and 40 observations, respectively. Both models use latent rank d ¼ 2.
The time series and the corresponding predictions are displayed in Fig. 6. During the model fitting period, TVRRR captures the first and second wave of the pandemic nearly perfectly. For updated RRR, we can again observe the effect of averaging: The second peak leads to overestimation of the starting period. In the validation period from February 2021 to November 2021,  the one-step-ahead predictions of TVRRR closely track the progression of the number of Covid-19 cases in all countries. In contrast, the updated RRR performs badly in the validation period. This can also be seen from the mean squared prediction errors shown in Fig. 7. The overall MSE of TVRRR in the prediction period is less than one quarter of the MSE of updated RRR (3873 vs. 16,761). The Diebold-Mariano test finds statistically significantly better prediction performance of TVRRR for all countries except for the Netherlands.
We also tried model (B) for this data set. However, its performance is far worse than that of model (A), indicating that the time-invariant linear combinations b 0 y t contain essential information for prediction. However, these linear combinations have a time-varying effect a t on the Covid-19 cases in the different countries.

Conclusions
Reduced-rank regression forms the basis for various models frequently used in applications, such as multivariate time-series regressions, cointegration models and methods for sufficient dimension reduction based on inverse regression. We propose a new time-varying coefficient reduced-rank regression modeling approach (TVRRR) by modeling the dynamics of the observations and the time-varying coefficients as a Gaussian linear state-space system. The estimation algorithm relies on classical methods from state-space modeling. We estimate the time-varying parameters using the Kalman filter, and fit the model to observed data using an EM-algorithm which relies on analytic updates and is numerically stable. Our EM-algorithm converges quickly most of the time. It needs on average 40 iterations in our simulation studies. It takes longer to reach convergence in real data applications, as expected. With respect to computer time, the estimation of the model for the Covid data takes about 10 seconds on a standard laptop with an Intel(R) i5 processor with 16 GB RAM. We use BIC to estimate the rank of the coefficient matrices with highly accurate results.
Our simulation studies and the data applications show the adaptive capabilities of the algorithm. It exhibits high estimation accuracy across different patterns of time-variation, as it accommodates both abrupt structural breaks and gradual changes in the time-varying coefficients. Moreover, TVRRR exhibits robustness to time-constant coefficients. An R-implementation of the model fitting algorithm is available on GitHub (https://github.com/b-brune/tvRRR). Additional simulation results are provided in the supplementary material.