A recursive Riccati interior-point method for chance-constrained stochastic model predictive control

This study covers the model predictive control of linear discrete-time systems subject to stochastic additive disturbances and state chance constraints. The stochastic optimal control problem is reformulated in a dynamic programming fashion to obtain a closed-loop performance and is solved using the interior-point method combined with a Riccati-based approach. The proposed method eliminates active sets in conventional explicit model predictive control and does not suffer from the curse of dimensionality because it finds the value function and feedback policy only for a given initial state using the interior-point method. Moreover, the proposed method is proven to converge globally to the optimal solution Q-superlinearly. The numerical experiment shows that the proposed method achieves a less conservative performance with a low computational complexity compared to existing methods.


Introduction
Model predictive control (MPC) has become important for control applications in various fields (please see [1]) owing to its ability to effectively cope with the complex dynamics of systems with multiple inputs and outputs and to intuitively consider input or state constraints.Robust MPC has attracted the attention of many scholars because it enables the stability and performance of a system against worst-case perturbations analysed under bounded uncertainties.However, the probability of worst-case perturbations can be very small in the real world [2]; therefore, a control design based on worst-case perturbations would be too conservative.
In contrast to robust MPC, stochastic MPC (SMPC) uses probabilistic descriptions of objective values and constraints, and accounts for acceptable levels of constraint violations during system operation, which are called chance constraints or probabilistic constraints [3].SMPC provides a systematic way to achieve a tradeoff between fulfilling the control objectives and ensuring a certain degree of constraint satisfaction under uncertainties.The ability to handle constraints in a stochastic scheme is particularly important for the control of uncertain systems, which are often operated in the vicinity of constraints such as economic MPC [4].
Stochastic descriptions such as chance constraints complicate SMPC problems.The different processing methods for these stochastic descriptions lead to different levels of conservativeness.In principle, Bellman's principle of optimality [5] can be used to obtain an ideal closed-loop performance; however, it is always intractable, especially in large-scale optimization [6].In a certainty-equivalent SMPC scheme, the expectation formulations of the cost function and chance constraints are transformed into deterministic formulations [7], which can achieve the so-called "openloop" control performance.There are also some heuristic methods for reformulating the stochastic optimization problem for better control performance.A commonly used method is to pre-parameterize the control input as an affine function of the state or past disturbance by using pre-fixed state feedback [8], state feedback [9], affine disturbance feedback [9], and simplified affine disturbance feedback [10].Another formulation directly considers the minimization of variance in the cost function rather than adding a feedback feature to the control inputs [11].Although these methods can artificially include some degree of closed-loop control effects, they still face some difficulties, such as the increase in decision variables and the difficulty of explicitly analyzing their degree of conservativeness.
Approximate dynamic programming (ADP) methods, such as differential dynamic programming (DDP) [12] and iterative LQR (iLQR) [13], which yield suboptimal solutions to the Bellman equation using a Riccatibased algorithm, have received considerable attention in recent research.To avoid solving the exact solution of the original Bellman equation, ADP decouples the nested optimization problem and decomposes a large problem across a control sequence into a recursive series of small problems and then recursively solves each single-stage problem.This method has an advantage in terms of computational efficiency compared with collocation methods [14] and shooting methods because the size of each subproblem is time-independent, and the computational complexity grows only linearly with the prediction horizon, which is very attractive for real-time computing.Moreover, ADP is suitable for SMPC because a solution of stochastic dynamic programming (SDP) can provide a more intuitive consideration of closed-loop control performance than other heuristic formulation methods.
However, constraint handling in a Riccati-based algorithm remains problematic.There have been few discussions on this for the stochastic case.Several related studies have been conducted on deterministic cases.In [15], an active-set-based method was proposed to solve the constrained LQR problem.In [16], an active-set method was applied to ADP to address general nonlinear constraints.However, the discussion of an active set is complicated and makes the algorithm inefficient.In [17], a penalty method was used to obtain a suboptimal solution using the constraint iLQR algorithm.In [18], an interior-point based method was used to solve a constraint DDP problem, but the search area of their algorithm does not strictly stay within the "interior-point." In this study, we intend to find an SDP solution for a linear SMPC problem with chance constraints.A novel recursive Riccati interior-point method (RRIPM) is proposed to solve the ensuing constrained optimization problem.In contrast to the conventional methods, the proposed method avoids the complex discussion of active sets like the explicit MPC and finds the value function and feedback policy only for a given state using the interior-point method, which is crucial for avoiding the curse of dimensionality.Moreover, the proposed method is proven to converge globally to a stationary solution Q-superlinearly.The numerical experiment reveals that the proposed method achieves a less conservative performance with low computational complexity compared to existing methods.
The remainder of this paper is organized as follows.Section 2 discusses the problem settings throughout this study and some preliminary results for the SMPC.Section 3 reformulates the problem and discusses optimal conditions.Section 4 presents our main algorithm, RRIPM, and discusses its convergence properties.Section 5 presents simulation results.Finally, Section 6 concludes the paper.

Problem settings
Consider a linear discrete-time system with an additive Gaussian disturbance: where k denotes the discrete time, x k ∈ R n denotes the state, u k ∈ R m denotes the control input, and w k ∈ R r denotes a random disturbance.The chance constraints on the state and polytopic constraints on the input are given as where , Pr denotes the probability, p vio ∈ (0, 0.5] denotes the maximum allowed probability of state constraint violation.Define an N-horizon quadratic cost function as where E denotes the expectation value, Q ≥ 0 and R > 0 are symmetric matrices with appropriate dimensions, and symmetric matrix Q N ≥ 0 is the terminal cost.
Remark 2.1: For the local stability of the closed-loop system, it is common to choose Q N as the solution of the algebraic Lyapunov equation [19]: and, in turn, K is the gain computed as the solution of a linear quadratic control problem, with state and control weights Q, R, for the nominal model of system (1).
We make the following assumptions throughout this paper: Assumption 2.1: Measurement of all states are available at each sample instant.

Assumption 2.2:
The disturbances are subjected to an independent Gaussian distribution, i.e. w k ∼ N (0, w ), and the covariance matrix w is positive definite.
The SMPC solves the following finite-horizon SOCP with a known initial state x0 at each time instant: Problem 2.1 (SOCP):

Analytic reformulation of chance constraints
In the case of a Gaussian random vector ω ∼ N (μ, ω ), the chance constraint has the following linear form: where χ is the decision variable.It can be analytically reformulated as a deterministic constraint: where H T ω H T = T , and can be obtained by matrix decomposition methods such as the Choleskey decomposition or LDL decomposition.By defining the cumulative distribution function (CDF), this equation can be rewritten as where denotes the standard Gaussian CDF.Considering the inverse of the standard Gaussian CDF −1 , we obtain the following result: and the −1 can be computed offline only once with an arbitrary precision.We can summarize the analytic reformulation of the chance constraint as a deterministic constraint for μ: where t(1 − p vio ) = −1 (1 − p vio ) is the constrainttightening level of a chance constraint, which implies that the chance constraint is a type of tightened constraint.Furthermore, even if the distribution of the random variable is not specified, if its mean μ and covariance matrix ω are known, it is still possible to find an approximation using the Chebyshev inequality [20].

Remark 2.2:
It is difficult to guarantee the recursive feasibility and stability of SMPC under chance constraints, because the disturbance may take unboundedly large values.There are generally two approaches in the current literature to deal with feasibility issues with chance constraints.First, in some notable works (e.g.[21,22] ), the chance constraint is treated as a soft constraint, that is, in case of infeasibility, an alternative and feasible optimization problem is solved.The second choice is to assume that the disturbances are bounded [23].They are based on imposing suitable worst-case constraint tightening to Equation ( 9) by accounting for bounded sets where the state can evolve due to the bounded noise affecting the state equation.But the discussion of feasibility is beyond the scope of this study.Thus, we simply consider an analytic reformulation of chance constraint for an efficient numerical algorithm.

Problem reformulation
In theory, Bellman's principle of optimality can be used to solve Problem 2.1.At each stage k, the optimal costto-go V k (x k ) (i.e. the value function) should satisfy the Bellman equation in the SDP scheme.Problem 3.1 (Bellman equation): The solution to Problem 2.1 comprises a sequence of solutions and is not necessarily defined for every x k ∈ R n because of the constraints.
The available measurement information at each stage plays an important role in stochastic control and leads to different policy types.A discussion on various policies can be found in [24].The open-loop policy is defined as the absence of immediate observation data at each stage, i.e.only the initial measurement can be used.The feedback policy indicates that the measurement information till current stage k is available for the computation of the contro1.
To obtain the closed-loop control performance, we use the concept of a feedback policy, i.e. we know the measurement of x k at stage k, and the mean and covariance are xk and 0, respectively.Because the full state is observable according to Assumption 2.1, the future information of x k+1 can be obtained as follows: where xk+1 and k+1 denote the estimated mean and covariance respectively.Using these settings, we can remove the stochastic descriptions in Problem 3.1.The chance constraint can be converted to deterministic inequalities as in (9): where the constraint-tightening level where By substituting (10) into this deterministic constraint, it can be expressed by the current states and inputs as follows: Together with the input constraints, the admissible region k can be rewritten as where The remaining question concerns the expectation of the cost function.From the dynamic programming solution in the constrained explicit MPC case [25], we know that the exact form of the optimal value function in the linear case has the following quadratic form in an appropriate region for each stage: x T k+1 P k+1 x k+1 + q T k+1 x k+1 + r k+1 .( 16) where P k+1 , q k+1 and r k+1 are second-order, first-order and constant coefficient, respectively.Remark 3.1: Explicit MPC seeks to find an offline dynamic programming solution; therefore, the optimal value function and feedback policy will be a piecewise function that depends on the state x k .By contrast, in our case, we want to find an online solution only for a given x0 .Therefore, the value function and feedback policy are determined only in a certain sequence of regions for a given x0 , rather than by using the piecewise form over the state space.
Using the mean and covariance information of x k , we can transform the expected value function as follows: Because the covariances of x k and x k+1 are all constants, they can be omitted for simplicity.Now, we can provide an equivalent deterministic problem under mean and covariance matrix dynamics.Hereafter, for simplicity, we use x k to directly represent the mean value xk .

Optimality conditions
The Lagrange multiplier method [26] can be used to rewrite the value function in Problem 3.2 as follows: where and s k denotes the Lagrange multiplier.
Let the stage-wise cost be 16) and the mean prediction Equation ( 10) into (18), we can write Q k as where The KKT conditions for the single-stage optimization problem (18) called stage-wise KKT (SKKT) conditions are where This SKKT condition can be considered a parametric equation with respect to x k .Therefore, if we can solve these equations at each stage k, we can obtain a sequence of state-feedback solutions u k (x k ) and the corresponding optimal sequence (x * k , u * k , s * k ).We present our main results regarding solutions to the SKKT conditions (20a) in the next section.

Main results
In this section, we introduce our main results for the RRIPM algorithm and its convergence properties.

RRIPM
Note that the SKKT conditions (20a) are nonlinear equations with nonsmooth complementary conditions, which cause numerical difficulties in the direct solution.We use the interior-point method to avoid the complicated searching of active sets and find solutions of (20a) by applying Newton's method to the perturbed SKKT conditions [27] and iteratively reducing the smooth parameter.More specifically, our proposed method improves the solution of each Newton iteration by solving the perturbed SKKT conditions in the vicinity of the current trajectory for the search directions through a backward pass and by computing a new trajectory through a forward pass.
By further introducing the slack variable y k to transform inequalities into equalities, we can write the SKKT conditions (20a) in the slack variable form: Let x 0 := (x 0 0 , . . ., x 0 N ) and u 0 := (u 0 0 , . . ., u 0 N−1 ) denote the initial guesses of the state and control, respectively, which are feasible for the nominal system equation.s 0 := (s 0 0 , . . ., s 0 N−1 ) and y 0 := (y 0 0 , . . ., y 0 N−1 ) denote the initial guesses of the Lagrange multipliers and slack variables, respectively, which are all non-negative.At the ith iteration, the trajectories of the last iteration, (x i−1 , u i−1 , s i−1 , y i−1 ), are known.The backward pass and forward pass complete a Newton iteration, and the entire algorithm terminates after the obtained trajectory satisfies the optimality condition, which is judged in the outer loop.

Backward pass
Initialize: let At stage k, represent the optimization variables by their search direction: The new trajectory satisfies the perturbed SKKT conditions: where is the current duality measure (central parameter) and the update of , where σ ∈ [0, 1] is the reduction factor that we wish to achieve the duality measure at this step.When σ > 0, we solve for a perturbed KKT point at each Newton iteration.It is well known that a standard Newton direction with σ = 0 often does not make much progress toward a solution; therefore, it is better to choose a positive σ to reduce the central parameter iteratively [26].
Representing all the variables by their search direction, the only nonlinear equation is The interior-point algorithm finds solutions by applying Newton's method to the three equations in (23a), we can obtain the following linear equations: where ⎡ ⎣ are the primal and dual infeasiblities of the last iteration.
If the matrix on the left side of ( 24) is nonsingular, we can solve the parametric system (24) to obtain the unique solution Writing (u, s) as a function of x, By substituting it into (18), we obtain the expressions for the coefficients of V k−1 , which are given by the following Riccati-type recursion: Since the value of r k only affect Q c , and the value of Q c is not used in any calculations, we omit the update of constant term in both Q-function and value function to avoid unnecessary calculations.The backward pass above is iteratively performed from k = N−1 to k = 1 to finish the backward pass.

Forward pass
Starting from k = 0, x i 0 = x0 is already known by measurement, which means that δx 0 = 0.
At stage k, the search directions (δu k , δs k , δy k ) are calculated using (26).The interior-point method generates a new trajectory (x i , u i , s i , y i ) in a wide neighbourhood, as proposed in [28].
satisfying the following conditions: where • denotes the infinity norm.Define the new iterate where the step size α i k is determined by the following line search rules: STEP 1.Let ᾱ ∈ (0, 1] be the maximum value for all α ∈ (0, ᾱ].The following conditions are satisfied: Generally, it is common to let α y = α s = ᾱ.We show the existence of ᾱ by proving the existence of a positive number α * such that ᾱ ≥ α * holds as long as the iteration continues, which also leads to a global convergence result for our algorithm.After calculating (u i k , s i k , y i k ), we can compute x i k+1 using the nominal system equation The above update is repeated recursively from k = 0 to N−1 to obtain a new trajectory (x i , u i , s i , y i )and complete the forward pass.

Outer loop
The backward pass and forward pass together complete the Newton steps.The algorithm terminates when the optimality, duality, or constraint feasibility measurements meet the stop criteria: or the duality measurement satisfies for any large ι.
Remark 4.1: If (35) holds, we can derive information on infeasibility such that there is no feasible solution in a certain wide region of the primal-dual space [29].
We summarize the entire process in Algorithm 1.

Convergence analysis of RRIPM
The global convergence property and the local convergence rate of the proposed algorithm are discussed in this subsection.Let z = (x, u, s, y) and F k (z k ) = 0 denote the SKKT conditions (21a), and let z * denote the stationary point of F k (z k ) = 0.The following assumptions are made in this subsection.

Assumption 4.1:
The optimal solution z * exists and is unique in the neighbourhood N (γ , o , f ) for a given initial state x 0 , and it holds a strict complementarity.
Assumption 4.2: Q uu is positive definite.
Before discussing the convergence result, we provide the following lemma to guarantee the nonsingularity of the coefficient matrix on the left-hand side of (24).Lemma 4.2: Under Assumption 4.2, the SKKT matrix is nonsingular for all k = 0, . . ., N − 1.
Proof: If z * is a solution point for which strict complementarity holds, then for every iteration i at stage k, under the line search rules ( 31)-(32), either s i k or y i k remains bounded away from zero as the iterates approach z * k , ensuring that the second block row 0 Y k S k has a full row rank.Because Q uu is nonsingular, the first and third block rows also have full row ranks.Together with the linear independence of these three block rows, we can conclude that (36) is nonsingular.
The following theorem provides a global convergence result for Algorithm 1.
The second half arises from a contradiction.Suppose Algorithm 1 does not terminate.At iteration i, for all k, we have where * = min{γ o , d , γ f }; otherwise, z i−1 satisfies either the stopping criteria (34) or (35).Therefore, the entire sequence Together with the nonsingularity of the SKKT matrix from Lemma 4.2, the Newton direction (δu k , δs k , δy k ) generated by ( 24) is bounded for all We define the functions g o , g j d (j = 1, . . ., (l x + l u )), g f , and h from the conditions in (31) as follows: From ( 25) and (31), we have By (24), we know that Simultaneous with (41) and (42), we have Similarly, we have Hence, α are chosen to satisfy the following conditions such that the new iteration satisfies Equation (31).Because the above four equations are similar, let us consider the second equation about g o (α) as an example: For ≤ o , for any α ∈ (0, 1], we have When , the first term on the right-hand side of ( 46) is always positive for any α ∈ (0, 1].From the boundness of (δu k , δs k , δy k ), we find a positive constant such that Therefore, we have the inequality because (s i−1 k ) T y i−1 k ≥ * .Similarly, we obtain the conditions for g j d (α), g f (α), and h(α): We can find a positive α * by letting such that for any α ∈ (0, α * ], (45) holds true.
From the definition of ᾱ, ᾱ ≥ α * .We can find the step size α i k to determine a new trajectory that satisfies (32), i.e.
Evidently, the last equation converges to zero as iteration i tends to ∞, which contradicts (38).
Theorem 4.3 reveals the finite termination of Algorithm 1. Together with Assumption 4.1, it can be concluded that our algorithm converges to a sequence that satisfies the SKKT conditions; in other words, the algorithm globally converges to z * .
Theorem 4.4: If α i → 1 and μ i → 0, the sequence {z i } generated by Algorithm 1 converges to the optimal solution z * Q-superlinearly.
We define the coefficient matrix in F k (z k ) = 0 as At each stage, consider the Taylor approximation to both sides of (54), we have From the forward update (30), we know the update of ζ .
where δζ k is the solution of the linear equation (24): where p denotes a fixed vector.Substituting (56) into (57) yields Then, simultaneous (55) and (58) to eliminate the term F(z i k ): From the system Equation (1), because x i+1 0 = x * 0 = x 0 , we have By continuing by induction until stage k and representing the right-hand side by ζ , we obtain where M j denotes an appropriate constant matrix.Substituting (61) into (59), when k = 0, we have Similarly, at k = 1, . By continuing this induction until k = N−1, we obtain where ϒ ζ and ϒ μ are the appropriate coefficient matrices, and ϒ ζ is composed of the product of (1 − α i k ) and other bounded matrices.By combining (61) and (64), we have (65) where ϒ ζ → 0 holds true when α i → 1.Therefore, we can conclude that when i → ∞, that is, α i → 1 and μ i → 0, we have (66) Then sequence {z i } converges to z * Q-superlinearly.

Case study
We tested our method with a linear system: the control problem of room temperature [30].The basic control target was to maintain the room temperature above a certain level in the presence of external disturbances.The system contained three states: let x 1 be the room temperature, let x 2 be the temperature of the wall connected to another room, and let x 3 be the temperature of the wall connected to the outside.The system was subjected to three disturbances in which w 1 denotes the outside temperature, w 2 denotes the solar radiation, and w 3 denotes the internal heat gains (e.g.people and electronic devices), and the entire disturbances w subjected to N (0, I).The only control input, u, was the heating.
The control objective was to maintain the room temperature above 21 • C with minimum energy; Since the objective function ( 3) is intended to regulate the state into the origin, the weighting matrix was set to Q = 0 and R = 1, which implies that only the control input is evaluated in the objective function, and the room temperature target is achieved by imposing the constraint.The state constraint was treated as a chance constraint: The control input also had limitations imposed by the actuator, 0 ≤ u ≤ 45[W/m 2 ].The parameters of the system were obtained from [31]  Because explicit MPC is intractable in the present problem settings when the prediction horizon is larger than two, we compared the following three control strategies in terms of control performance, constraint violation rate, and computation time: (1) Open-loop policy [7]: Only the information of the initial measurement x 0 can be used.(2) Pre-parameterization (PP) [9]: Parameterizes the control policy as u k = K k x k + v k first and searches directly over the decision variables (K k , v k ), which is a heuristic formulation of the feedback policy.(3) Proposed RRIPM.
We carried out the simulation on a laptop computer with a 2.60 GHz Intel Core i7-6700HQ in MATLAB 2020a.The first and second problems were solved using the interior-point method with direct shooting [27] and coded in the MATLAB environment.
Figure 1 shows the time evolution of the mean of x 1 and corresponding 95% confidence, and Figure 2 shows the mean of control input.All three control methods are implemented with initial state [26,26,21] T , prediction horizon N = 6 and allowed constraint violation rate p vio = 10%.The results reveal that the most conservative control performance can be achieved with an openloop policy; the room temperature was almost always higher than the room temperature of the other two control policies while our proposed method resulted in the least conservative control performance.The means and confidence regions of PP and RRIPM almost coincided; however, PP was slightly more conservative than  the proposed RRIPM, which may be due to the fact that pre-parameterization is a heuristic formulation of a feedback policy.A more intuitive comparison of control performance can be seen in Table 1.We considered optimal solutions for several initial states.The results reveal that the open-loop policy consumes the most energy for an identical optimization problem (i.e.most conservative), while the proposed RRIPM always achieves the smallest objective function value.
The constraint violation rates were compared based on 100 Monte Carlo simulations.Figure 3 shows the time evolution of x 1 for p vio = 10%.As can be seen from the figure, all three control methods exhibit certain degrees of constraint violation.The computation times are listed in Table 3.The data in the table reveals that, when the prediction horizon is short, the computation times of the open-loop policy and RRIPM are similar and smaller than those of the PP because they have fewer decision variables.However, when the prediction horizon increases, RRIPM has obvious advantages because the computational complexity of Riccati recursion increases linearly with the prediction horizon.Figure 4 shows the shape of the obtained first step of the optimal feedback control law u 0 (x).In principle, the dynamic programming solution of a linear system with linear constraints should be a piecewise linear function w.r.t.state variables.We obtained u 0 (x) by pointwise computation of RRIPM within a certain area of x, i.e. x 1 ∈ [18, 30], x 2 ∈ [18, 30] and fixed x 3 = 10.The figure reveals that the optimal solution u 0 exhibits the shape of a piecewise linear function in the (x 1 , x 2 ) space.In other words, our algorithm generates same control inputs as a piecewise affine feedback law of explicit MPC while getting rid of active sets and does not suffer from the curse of dimensionality.Moreover, our proposed method also achieves low computational complexity compared to existing methods.

Conclusions
We reformulate the SOCP with chance constraints as a deterministic Bellman equation under the concept of a feedback policy.A RRIPM is proposed to solve the stage-wise KKT conditions.Our proposed method avoids the complicated discussion of the active set and determines that the optimal control policy depends on the current state.We proved the global convergence and local Q-superlinear convergence rate of our algorithm for the optimal solution.The simulation results demonstrate that our proposed algorithm can achieve ideal less conservative control performance, that is, to make the constraint violation rate as close as possible to the set value.Moreover, our proposed method achieves low computational complexity compared to existing methods.

Figure 1 .
Figure 1.Room temperature profile and corresponding 95% confidence region.

Table 2 provides a Table 1 .
Comparison of the objective function values for different states under the same settings (N = 6).moreintuitive comparison, showing the constraint violation rate for p vio = 1%, p vio = 5%, and p vio = 10%.The results reveal that, for different values of p vio , our proposed RRIPM always remained the closest to the set value, the PP and RRIPM were close but slightly smaller, and the open-loop policy was significantly lower than the set value.

Table 2 .
Comparison of the constraint violation rate based on 100 Monte Carlo simulations.(%).

Table 3 .
Average computation time per update (ms).