Superlinear Convergence of the GMRES for PDE-Constrained Optimization Problems

ABSTRACT Optimal control problems for PDEs arise in many important applications. A main step in the solution process is the solution of the arising linear system, where the crucial point is usually finding a proper preconditioner. We propose both proper block diagonal and more involved preconditioners, and derive mesh independent superlinear convergence of the preconditioned GMRES iterations based on a compact perturbation property of the underlying operators.


Introduction
Optimal control problems for partial di erential equations (PDEs), where we want to steer the solution of the modeled process close to some desired target solution by use of a control function, arise in many important applications. Such problems have been dealt with in several publications, such as [2,3,10,16,21], see also the references therein. Earlier publications have mostly dealt with problems when the control and observation domains coincide, however, in recent papers they may be allowed to be di erent. The general approaches are the discretize-then-optimize or optimize-then-discretize processes: recent research shows that one should use discretization schemes for which both approaches coincide. A main step in the solution process is the solution of the arising linear system, where the crucial point is usually nding a proper preconditioner.
We propose both proper block diagonal and more involved preconditioners. Mesh independent superlinear convergence is derived for the preconditioned generalized minimum residual method (GMRES) iterations, based on a compact perturbation property of the underlying operators. These are new contributions to the topic, since previous results for such problems only studied linear convergence properties. The paper begins with the required preliminaries, then the new results are presented in detail for a time-independent distributed control problem, nally some related problems are mentioned in the last section.

Preliminaries
We elaborate our preconditioning approach for a time-independent distributed control problem, described below, where the control and observation domains are di erent. Further related problems will be mentioned in Section 4.

Formulation of the problem
We consider a time-independent distributed control problem, with target solution y and control function u, using H 1 -regularization, as described in [10]. Let ⊂ R d be a bounded domain, and 1 , 2 given subsets of : the observation region 1 and the control region 2 . Minimize J(y, u) := 1 2 y − y 2 Here g is a xed boundary term that admits a Dirichlet li g ∈ H 1 ( ), and β > 0 is a regularization constant.
This leads to the following system of PDEs in weak form for the state and control variables and the Lagrange multiplier: The system can be homogenized, using the splitting y = y 0 +g where y 0 ∈ H 1 0 ( ). Therefore, in what follows, we may assume that g = 0, and hence y ∈ H 1 0 ( ). The nite element solution is then carried out in a usual way: we introduce suitable nite element subspaces and replace the solution and test functions in (2.3) with functions only in the above subspaces. Let us x proper bases in the subspaces and denote by y, u, and λ the coe cient vectors of these nite element solutions. Then we obtain a systems of equations in the following form: Here M y and M u are the mass matrices corresponding to the subdomains 1 and 2 (i.e., that are used to approximate y and u), and similarly, K and K u are the sti ness matrices corresponding to and 2 , respectively, further, the rectangular mass matrix M corresponds to function pairs from × 2 . We note that λ and y have the same dimension, they both represent functions on , whereas u only corresponds to nodepoints in 2 . We also note that the last r.h.s is 0 due to g = 0. In the general case g = 0 we would have some g = 0 on the last r.h.s, i.e., non-homogenity would only a ect the r.h.s. and our results would remain valid.
A er rearrangement, we obtain in matrix form that  3) has a unique solution, as well as system (2.4). See [2,3,10] for more details on the problem. Our goal is to de ne an e cient preconditioned iterative solution method for the above linear system, and to derive a mesh independent superlinear convergence rate. Previous work of the authors includes such superlinear estimates on coercive or complex-valued equations [4][5][6][7]. The present paper includes its extension to inde nite real-valued systems.

Superlinear convergence of the GMRES
In what follows, we will need the solution of linear systems with a given nonsingular matrix A ∈ R n×n . When A is large and sparse, one generally uses a Krylov type iterative method, see, e.g., [1,11,20]. In this paper we are interested in superlinear convergence rates of the iteration. Here we summarize brie y the required background. For the symmetric positive-de nite case, the well-known superlinear estimate of the standard CG method is obtained as follows, see, e.g., [1]. Let us consider the decomposition where I is the identity matrix, and let λ j (E) =: µ j . Let us de ne the polynomial are ordered according to |λ j − 1|, i.e., such that |µ 1 | ≥ |µ 2 | ≥ ... ≥ |µ n |. Since P k (λ i ) = 0 (i = 1, . . . , k), and using that |µ Using the minimax property of the CG method, (2.8) and the arithmetic-geometric means inequality, and returning to the notation λ j (E) = µ j , we nally obtain that In the present paper the matrix is nonsymmetric, for which also several Krylov algorithm s exist, in particular, GMRES and its variants are most widely used. There exist similar e cient superlinear convergence estimates for the GMRES, based on the decomposition (2.7). In fact, the sharpest one has been proved in [17], using products of singular values and the residual error vectors r k := Au k − b, on the Hilbert space level for an invertible operator A ∈ B(H). One has where the singular values for a general bounded operator are de ned as the distances from the best approximations with rank less than j. Hence, clearly, s j (A −1 ) ≤ A −1 for all j, thus the right hand side (r.h.s.) above is bounded by k j=1 s j (E) A −1 k . Using the inequality between the geometric and arithmetic means, we obtain the following estimate: where the r.h.s. is a sequence decresing towards zero.

Discretization and block matrix formulations
We consider a nite element discretization of problem (2.3) as described in Section 2.1 . The convergence of the nite element solutions to the exact one is ensured by the standard approximation property: (3.1) Let us denote by A h the global sti ness matrix of system (2.5): and let us also use compressed notations for the solution vector and the r.h.s. as i.e., the system (2.5) which we wish to solve is We will denote the total degrees of freedom (DOF) by n, i.e., the size of above system is n × n. Let us de ne the block diagonal and the split part, respectively: (3.5) By the de nition of the used sti ness and mass matrices, we have the following relation between the above matrices and the underlying inner product ., . H and operators Q, L. Let be given functions and let y, z, u, v, µ, and λ be their coe cient vectors, respectively. Following (3.14) and (3.3), let Then we have where · denotes the ordinary inner product on R n . Accordingly, the natural inner product on R n for our problem is the S h -inner product .
Since A h is regular, we note that it satis es an inf-sup condition: where, on the other hand, m h might in general depend on h.

Iterative solution and block diagonal preconditioning
We will use the block diagonal matrix S h as preconditioner.
Hence the preconditioned form of (3.4) becomes We apply a preconditioned GMRES method to solve (3.9). The preconditioner is based on the idea of equivalent operators [6,12]. Let us introduce the uniformly positive elliptic operator in the product space H, where β > 0 is the constant used in (2.3). Then the sti ness matrix of S coincides with the diagonal preconditioner S h introduced in (3.5). The auxiliary problems with S h are thus discretizations of uncoupled positive de nite elliptic equations with constant coe cients, and hence can be solved with an optimal order of the number of operations [15,19]. Consequently, if we prove mesh independent rate of convergence, then the overall number of operations is also of optimal order. As seen above, the preconditioned system takes the form (3.9), i.e., we have a counterpart of (2.7). Applying the GMRES algorithm for the matrix we obtain the following counterpart of estimate (2.11): Our goal is to give a bound on (3.11) that is independent of the subspaces Y h , U h , h . This will be shown by a suitable modi cation of our results in [4,5].

Hilbert space background
We introduce the Hilbert space (i.e., b is the Riesz representant of the integral functional), and also the bounded linear operators Q 1 : Then system (2.3) can be rewritten as follows: in H. Then (3.12) is equivalent to or simply the operator equation in H. Using notation we may just write Since L is a compact perturbation of the identity, the well-posedness of the above equation implies using Fredholm theory that L is invertible, in particular the infsup condition holds: Our estimates will involve compact operators in a real Hilbert space H, see, e.g., [13,Chapter VI], and the following notions: where λ j (C * C) are the ordered eigenvalues of C * C.
A basic property of compact operators is that s j (C) → 0 as j → ∞. Now we verify that the operators in our decomposition of the problem are compact. Proof. It is well-known that the Riesz representant of the L 2 inner product in a Sobolev space de nes a compact operator, see, e.g., [14] (in fact, it is the inverse of the Laplacian or its shi ed version). The operators Q 1 and Q 2 de ne the Riesz representants of L 2 inner products on 1 resp. 2 , i.e., the abovementioned compact operator is only composed with a restriction operator from to 1 or 2 in L 2 ( ). Since this restriction is obviously bounded, it preserves compactness.
This proposition readily yields the same for the corresponding operator matrix:

Corollary 3.1. Operator Q in (3.13) is compact.
We will also need the following result for the inf-sup condition: Proposition 3.2. [7] Let L ∈ B(H) be an invertible operator in a Hilbert space H, that is, (3.17) and let the decomposition L = I + Q hold for some compact operator Q. Let (V n ) n∈N + be a sequence of closed subspaces of H such that the approximation property (3.1) holds. Then the sequence of real numbers satis es lim inf m n ≥ m.

The superlinear convergence result
for some constant m 0 > 0 independent of h.
Proof. (a) The rst estimate is a special case of our result in [7], but such that we now have a better constant in the bound due to the symmetric preconditioner. Namely, by Axelsson et al. [7,Proposition 5.4], if N h is the sti ness matrix of an operator N in H that satis es 1, 2, . . . , n).  1, 2, . . . , n).
Taking square roots, this is the same as we wanted to prove.
On the other hand, (3.16) holds on the whole space H: However, Proposition 3.2 yields as the dimension n of V h tends to ∞. This implies that m h is bounded away from zero, i.e., there exists m 0 > 0 such that In virtue of (3.11) and Proposition 3.3, we have proved provides the mesh independent superlinear convergence estimate and (ε k ) k∈N + is a sequence independent of n and V h .

Block preconditioners of PRESB type
Instead of the block diagonal preconditioner used in the previous sections, one can apply a more general block preconditioner of "preconditioned square block matrix" (PRESB) type, extending the method in [8].
For this, one rst rewrites system (2.5) by eliminating the variable u. Namely, substituting u = − 1 β (M u + K u ) −1 M T λ, system (2.5) can be reduced to the 2 by 2 system As shown in [3], an explicit form of S −1 h is The action of S −1 h includes two solutions of linear systems with matrix K + M y , which corresponds to nite element method (FEM) solutions of standard elliptic equtions. Hence these auxiliary systems can be solved with an optimal order of the number of operations, and in case of mesh independent rate of convergence, the overall number of operations is also of optimal order as before. Let us summarize the convergence properties.

Superlinear convergence
We have the decomposition Here, similarly to the 3 by 3 case (3.5), the remainder matrix Q h contains only mass matrices, whereas the preconditioner S h includes sti ness matrices in both block diagonal terms, i.e., it corresponds to a Sobolev inner product. Hence one can similarly derive that the preconditioned matrix corresponds to a compact perturbation of the identity, and thus we obtain mesh independent superlinear convergence analogously to (3.19).

Linear convergence
In the above results the estimates depend on the parameter β > 0. If β is small, then superlinear convergence (although valid) is exhibited with large constant multipliers, i.e., it is not a really useful property. On the other hand, one can see that linear convergence can be bounded uniformly w.r.t. β. For this, we estimate the spectrum of S −1 h A h as follows. Let λ be one of its eigenvalues, i.e., let The second row yields M y ξ = Kη. Substituting this in the rst equation, we obtain Taking the inner product with η, and using that Kξ · η = Kη · ξ = M y ξ · ξ , we obtain i.e., In order to observe the uniform behaviour of θ min and θ max as β → 0, note that the de nition of M y and M implies More precisely, we can estimate as follows. We have (2 √ β K+M y )η·η ≥ M y η·η in the denominator, hence R(η) is bounded above uniformly in β. On the other hand, the previously seen equality M y ξ = Kη implies that Kη has zero coordinates where M y ξ has, i.e., in the nodes outside 1 , hence Kη · η = 1 |∇z h | 2 and M y η · η = 1 z 2 h (where z h ∈ Y h has coordinate vector η). Thus the standard condition number estimates yield Kη · η ≤ O(h −2 )(M y η · η). If we choose β = O(h 4 ), then the denominator satis es (2 √ β K + M y )η · η = O(h 2 )(Kη·η)+M y η·η ≤ const. M y η·η, hence R(η) is bounded below uniformly in β. Hence, altogether, θ min , θ max and ultimately the spectrum of S −1 h A h are bounded uniformly w.r.t β.

Boundary control problems
A modi cation of the distributed control problem (2.1)-(4.3), also studied in [10], is the boundary control problem, in which the same functional  where f represents a xed forcing term and the control function u is applied on the boundary. The FEM solution of this problem leads to a system very similar to (2.5). The mass matrix M is replaced by an (also rectangular) matrix N that connects interior and boundary basis functions, further, the mass and sti ness matrices for u act on the boundary, and are denoted by M u,b and K u,b , respectively. Thus the global system matrix takes the form Then our previous results hold for this problem as well with slight changes. In particular, the matrix N corresponds to the embedding of the boundary space L 2 (∂ ) into H 1 ( ). Hence, in a similar way, we obtain that the preconditioned matrix corresponds to a compact perturbation of the identity. Thus we can again derive mesh independent superlinear convergence of the preconditioned GMRES.

Box constraints
The functions y and/or u are o en assumed to satisfy additional pointwise constraints (box constraints). For instance, for the state variable y, one prescribes y a ≤ y ≤ y b for some given constants y a and y b . The corresponding constraint for u is u a ≤ u ≤ u b . Box constraints can be dealt with e ciently using a penalty term of so-called Moreau-Yosida type, see [9,10,18]. For the distributed control studied in this paper, the objective function (2.1) is modi ed as J MY (y, u) := J(y, u) + 1 2ε max{0, y − y b } 2 + 1 2ε max{0, y − y a } 2 for the state constrained case (where ε > 0 is a small penalty parameter) and similarly for control constraints. Applying a semi-smooth Newton scheme, one obtains linear systems with small modi cations of the system (2.4). A er rearrangement as in (2.5), the global system matrix becomes  where G A is a diagonal matrix with values 0 or 1, depending whether the actual value of y in that coordinate satis es or not the box constraint. The new factors G A at the mass matrix M y do not change the fact that the term G A M y G A corresponds to a compact perturbation of the identity, as well as the whole block matrix as before. Hence we obtain mesh independent superlinear convergence again.
We note, however, that the superlinear rate is exhibited with large constant multipliers when ε is small. Hence it is worth mentioning that the linear convergence rate is not sensitive to ε. Namely, as shown in [9], for this problem the eigenvalues cluster in two or three intervals: one near the upper bound 1, one in the middle and one near 0. The middle interval is [ 1+ε 2+ε , 1), the upper bound takes values arbitrarily close to unity when ε → 0. If β = O( h 2 ε ) then the lower eigenvalues are bounded below by ε 1+ε if ε < 1. For very small values of ε, the behaviour is similar to the case when there are several zero eigenvalues [1], i.e., the small eigenvalues have a negligible e ect on the solution when a Krylov subspace iteration is used.

Time-harmonic parabolic optimal control problems
In some problems the control and discrete state functions are time-harmonic, see [8] including an example when the target solution and the control funcion are time-harmonic for a parabolic PDE constraint. This reduces the problem to minimizing J(y, u) := 1 2 y − y 2 L 2 ( ) + β 2 u 2 L 2 ( ) subject to the elliptic PDE constraint − y + iωy = u y ∂ = 0 where y and y are real-valued but the control u must be complex-valued. A er rearrangement, the global system matrix becomes Introducing the block diagonal preconditioner and the corresponding remainder matrix S h := K 0 0 βK and Q h := iωM −M M iβωM , respectively, we see that Q h contains only mass matrices, whereas the preconditioner S h includes sti ness matrices in both block diagonal terms. Then our previous results can be used with a direct adaptation to the complex case (just replacing the transposed Q T h with the complex adjoint Q * h ), and we obtain mesh independent superlinear convergence again.

Funding
The research of O. Axelsson was supported by the Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project "IT4 Innovations excellence in science LQ1602". The research of J. Karátson was supported by the Hungarian Scienti c Research Fund OTKA, No. 112157.