A Dynamical Approach to Two-Block Separable Convex Optimization Problems with Linear Constraints

Abstract The aim of this manuscript is to approach by means of first order differential equations/inclusions convex programming problems with two-block separable linear constraints and objectives, whereby (at least) one of the components of the latter is assumed to be strongly convex. Each block of the objective contains a further smooth convex function. We investigate the dynamical system proposed and prove that its trajectories converge weakly to a saddle point of the Lagrangian of the convex optimization problem. The dynamical system provides through time discretization the alternating minimization algorithm AMA and also its proximal variant recently introduced in the literature.


Introduction and preliminaries
Since the seventies of the last century the investigation of dynamical systems approaching monotone inclusions and optimization problems enjoy much attention (see Brézis, Baillon and Bruck, Crandall and Pazy [18,8,19,20]).This is due to their intrinsic importance in areas like differential equations and applied functional analysis, and also since they have been recognized as a valuable tool for deriving and investigating numerical schemes for optimization problems obtained by time discretization of the continuous dynamics.The dynamic approach to iterative methods in optimization can furnish deep insights into the expected behavior of the method and the techniques used in the continuous case can be adapted to obtain results for the discrete algorithm.We invite the reader to consult [24] for more insights into the relations between the continuous and discrete dynamics.
This research area attracts the attention of the community continuously.There are several works in the last years concerning dynamical systems, which have a connection to numerical algorithms.Motivated by the applications in optimization where nonsmooth functions are involved, many authors consider dynamical systems defined via proximal evaluations.Through explicit time discretization they transform in relaxed versions of proximal point algorithms.For example [1] Abbas and Attouch proposed a dynamical system which is a continuous version of the forward backward algorithm (we mention here also the works of Bolte [13] and Antipin [3]), in [9] an implicit forward-backward-forward dynamical system was introduced and in [21] a dynamical system of Douglas-Rachford type was proposed.Acceleration of the dynamics can be achieved by considering second order differential equations/inclusions where again resolvents and proximal operators are involved in the description of the systems (see for example [15] and the works of Attouch and his co-authors [5,6]).This is a flourishing area in the continuous setting since the work of Su-Boyd-Candès [25], where a second-order ordinary differential equation was proposed as the limit of Nesterov's accelerated gradient method which involves inertial type schemes.
Let us underline that approaching optimization problems where compositions with linear operators are involved by means of differential equations/inclusions is relatively new in the literature (and this is the focus also in this manuscript).We mention here [16] (which is related to continuous counterparts of primaldual algorithms, Proximal ADMM and the linearized proximal method of multipliers) and also the contribution of Attouch [4] (related to some fast inertial Proximal ADMM schemes).
Before we introduce the dynamical system we want to investigate, let us make precise the optimization problem we consider and mention some notations used in this context.
We consider the following two-block separable optimization problem: where H, G and K are real Hilbert spaces, f : H → R := R ∪ {±∞} is a proper, lower semicontinuous and σ-strongly convex function with σ > 0 (i.e.f − (σ/2) • 2 is convex), g : G → R is proper, convex and lower semicontinuous, h 1 : H → R and h 2 : G → R are convex and Fréchet differentiable functions with L h 1 , respectively L h 2 -Lipschitz continuous gradients (L h 1 ≥ 0, L h 2 ≥ 0), i.e. ∇h 1 (x) − ∇h 1 (y) ≤ L h 1 x − y for every x, y ∈ H (analogously for h 2 ) and A : H → K and B : G → K are linear continuous operators such that A = 0 and b ∈ K.
It is well-known that (x * , z * , y * ) is a saddle point of the Lagrangian L if and only if (x * , z * ) is an optimal solution of (1), y * is an optimal solution of its Fenchel-Rockafellar dual problem and the optimal objective values of (1) and (2) coincide.Note that the (Fenchel) conjugate function f * : H → R of f : H → R is defined as If f is a proper, convex and lower semicontinuous function, then f * * = f , where f * * is the conjugate function of f * .The infimal convolution of two proper functions f 1 , The existence of saddle points for L is guaranteed when (1) has an optimal solution and, for instance, the Attouch-Brézis-type condition b ∈ sri(A(dom f ) + B(dom g)) (3) holds (see for example [14,Theorem 3.4]).In the finite dimensional setting this asks for the existence of x ∈ ri(dom f ) and z ∈ ri(dom g) satisfying Ax + Bz = b.For more on these generalized interiority notions and their role in optimization we refer the reader to [11] and [14].
Let f be a proper, convex and lower semicontinuous function.Then the Proximal Point Operator of f with parameter γ > 0 is defined as: The system of optimality conditions for the primal-dual pair of optimization problems (1)-( 2) reads: More precisely, if (1) has an optimal solution (x * , z * ) and a qualification condition, like for instance (3), is fulfilled, then there exists an optimal solution y * of (2) such that (4) holds; consequently, (x * , z * , y * ) is a saddle point of the Lagrangian L. Conversely, if (x * , z * , y * ) satisfies relation (4), then (x * , z * ) is an optimal solution of (1) and y * is an optimal solution of (2).We recall that the convex subdifferential of f is defined as , see for example [11].
are two saddle points of the Lagrangian L, then x * 1 = x * 2 .This follows easily from (4), by using the strong monotonicity of ∂ f and the monotonicity of ∂g.
Further, we denote by S + (H) the set of operators from H to H which are linear, continuous, self-adjoint and positive semidefinite.For M ∈ S + (H) we define the seminorm x, Mx .We consider the Loewner partial ordering on S + (H), defined for M 1 , M 2 ∈ S + (H) by Furthermore, we define for α > 0 the set P α (H) := {M ∈ S + (H) : M αId}, where Id : H → H, Id(x) = x for all x ∈ H, denotes the identity operator on H.
Let A : H → G be a linear continuous operator.The operator A * : G → H, fulfilling A * y, x = y, Ax for all x ∈ H and y ∈ G, denotes the adjoint operator of A, while A := sup{ Ax : x ≤ 1} denotes the norm of A.
In the next section we stress that the dynamical system leads through explicit time discretization to the proximal AMA algorithm [12] and the AMA numerical scheme [26].Furthermore we underline the role of the operators M 1 and M 2 , namely for a special choice of the linear maps M 1 and M 2 we obtain a dynamical system of primal-dual type which is a full splitting scheme.For this we consider a numerical example in order to show how the parameters for these particular linear maps can be chosen and influence the convergence of the trajectories.
We continue with the existence and uniqueness of strong global solutions of the dynamical system proposed above.The study relies on classical semigroup theory, showing that the system corresponds in fact to a Cauchy-Lipschitz system in a product space.This is far from being trivial and requires several technical prerequisites which are described in detail.
The last section is devoted to the asymptotic analysis of the trajectories and the connection to the optimization problems (1) and (2).The analysis relies on Lyapunov theory where the derivation of an appropriate energy functional plays a central role.The way the Lyapunov functional is obtained is quite involved and technical issues have to be investigated in order to achieve this goal (see the proof of Theorem 15 and (34)).Finally, we prove that the trajectories converge weakly to a saddle point of the Lagrangian L. We conclude the paper with some open questions and perspective.
The analysis used in this manuscript relies on similar tools considered in [16].Let us underline some differences in comparison to [16].First of all, our optimization problem (1) has a different structure, with two linear operators involved in the constrained set.Second, our dynamical system is related to the Proximal AMA algorithm [12], the AMA numerical scheme [26] and primal dual-type algorithms obtained in [12].The one in [16] is related to the Proximal ADMM [9], the classical ADMM and primal-dual type algorithms.Moreover, notice that in our case f is strongly convex which has an influence in the investigations performed here (and in particular the inclusion corresponding to f has a more tractable form).Moreover, in our analysis we have an additional parameter c, which is time varying, and this makes the investigation more involved (taking variable c is motivated by [26], where the numerical scheme AMA also involves a variable parameter).For more on the AMA algorithm introduced by Tseng and motivation for considering this setting we refer the reader to [26,12,22].

Solution concept, discretizations, example
We need the following definition before we specify what do we mean by a solution of (6).Definition 2. A function x : [0, +∞) → H is said to be locally absolutely continuous, if it is absolutely continuous on every interval [0, T], T > 0; that is, for every T > 0 there exists an integrable function y : [0, T] → H such that Notice that every locally absolutely continuous function is differentiable almost everywhere.Moreover, x : [0, T] → H is absolutely continuous if and only if (see [7,2]): for every ε > 0 there exists η > 0 such that for any finite family of intervals I k = (a k , b k ) ⊆ [0, T] the following property holds: for any subfamily of disjoint intervals We are now ready to consider the following solution concept.Definition 3. Let (x 0 , z 0 , y 0 ) ∈ H × G × K, and M 1 : [0, +∞) → S + (H) and M 2 : [0, +∞) → S + (G).The function (x, z, y) : [0, +∞) → H × G × K is called a strong global solution of (6), if the following properties are satisfied: 1. the functions x, z, y are locally absolutely continuous, 2. for almost every t ∈ [0, +∞) 3. x(0) = x 0 , z(0) = z 0 , and y(0) = y 0 .Remark 4. Let us consider a discretization of the considered dynamical system.The first two inclusions in (6) can be written in an equivalent way as ∀t ∈ [0, +∞).Through explicit discretization with respect to the time variable t and constant step size h k ≡ 1 (i.e.x(t) ≈ x k and ẋ(t) ≈ x k+1 − x k ) we obtain for all k ≥ 0 the inclusions: Furthermore, using convex subdifferential calculus this can be written equivalently for all k ≥ 0 as Hence the dynamical system (6) provides through explicit time discretization the following numerical algorithm: For all k ≥ 0 generate the sequence (x k , z k , y k ) k≥0 as follows: The algorithm above is the proximal AMA algorithm from [12].Moreover, in the particular case M k 1 = M k 2 = 0 and h 1 = h 2 = 0, the numerical scheme is the AMA algorithm introduced by Tseng in [26].
Remark 5. Let us show now that an appropriate choice of M 2 leads (both in continuous and discrete case) to an implementable proximal step in the second inclusion.This is crucial for numerical results in applications, see also [12] and [10].For every t ∈ [0, +∞), we define where τ(t) > 0 and τ(t)c(t) B 2 ≤ 1.
Let t ∈ [0, +∞) be fixed.Then M 2 (t) is positively semidefinite, and the second relation in the dynamical system (6) becomes a proximal step.Indeed, under the given conditions, one can see that (8) is equivalent to It follows that which is the same as If we choose furthermore M 1 (t) = 0, our dynamical system (6) can be written in this particular setting equivalently as where c(t) > 0 for all t ∈ [0, +∞).This can be seen as the continuous counterpart with proximal step of the AMA scheme [26].
Remark 6.In this paper we will often use the following equivalent formulation of the dynamical system (6).For U(t) = (x(t), z(t), y(t)), ( 6) can be written as is defined as Let t ∈ [0, +∞) be fixed.The functions F(t, •) and G(t, •) are proper and lower semicontinuous.When we assume that there exists an proper, strongly convex and lower semicontinuous for every v ∈ G. Since M 1 (t) ∈ S + (H) and f is strongly convex, the function p → F(t, p) + 1 2 p − u 2 is proper, strongly convex and lower semicontinuous for every u ∈ H. Therefore, if the assumption (Cweak) for every t ∈ [0, +∞) there exists an holds, then in (10) u and v are uniquely defined.
we can write the dynamical system for this problem (6) similarly as in Remark 6 (see also (9)) is defined as We solved the dynamical system with the starting points x 0 = (−10, 10), z 0 = (−10, 10) and y 0 = (−10, 10) in the case when c(t) > 0 and τ(t) > 0 and used the Matlab function ode15s.Notice that where proj Q is the projection operator on a convex and closed set Q ⊆ H.To assure the convergence of the algorithm we will prove later in Theorem 15 that it has to be fulfilled for an > 0 that c(t) < σ A 2 − for all t, where σ is the strong convexity parameter of f (x) (here σ = 1), and that c(t) is monotonically decreasing and Lipschitz continuous.For c(t) = c constant, we can choose c such that c < 2σ A 2 − .Besides, it has to be fulfilled that M 2 (t) is monotonically decreasing, locally absolutely continuous, positive definite and sup t≥0 Ṁ2 (t) < +∞ (we are in setting 1. of Theorem 15, see also Corollary 16 and Remark 17).
To guarantee that M 2 (t) is positive definite we have to choose τ(t) such that τ(t)c(t) B 2 < 1.Since A 2 = 1 and B 2 = 1 we can choose c(t) ∈ ( , 1 − ) and for c constant c ∈ ( , 2 − ) and τ(t)c(t) < 1.We considered for c(t) two constant choices, namely, c(t) = 0.25 and 1.99 and two variable choices c 1 (t) = 1 t 2 +1.1 + 0.01 and c 2 (t) = 1 √ t+1.1 + 0.01.Furthermore, we chose τ(t)c(t) = 0.25 and 0.99.These parameters fulfill the conditions above.In Figure 1 and 2 below one can see that independent of the choice of c(t) all three trajectories converge faster for a greater value of τ(t)c(t).Furthermore one can see that for smaller values of c(t) the trajectories converge faster.

Existence and uniqueness of the trajectories
In this section we will investigate the existence and uniqueness of the trajectories generated by the dynamical system (6).We need several preparatory results in order to show that we are in the setting of the Cauchy-Lipschitz-Picard Theorem.Lemma 8. Assume that (Cweak) holds.Let t ∈ [0, +∞).Then the operator is 1 σ -Lipschitz continuous and the operator Due to the σ-strong convexity of f and M 1 (t) Using the Cauchy-Schwarz inequality it follows  Because of (Cweak), we have that ∂g + c(t)B * B + M 2 (t) is β(t)-strongly monotone and we get Using the Cauchy-Schwarz inequality it follows Lemma 9. Assume that (Cweak) holds.Let be (x, z, y) ∈ H × G × K and the maps R (x,z,y) : [0, +∞) → H, and P (x,z,y) : [0, +∞) → K, Then the following holds for every t, r ∈ [0, +∞) : Proof.Let t, r ∈ [0, +∞) be fixed.
(i) From the definition of R (x,z,y) we have and If we add M 1 (t)(R (x,z,y) (r) + x) on both sides of the relation above, we obtain From ( 13) and ( 14) and using that ∂ f + M 1 (t) is σ-strongly monotone, we have The result follows from the Cauchy-Schwarz inequality.
(ii) From the definition of Q (x,z,y) we have and From ( 15) and ( 16) and using that ∂g From the Cauchy-Schwarz inequality, the definition of P (x,z,y) and (i) it follows Having now all these estimations at our disposal, we are now ready to prove the existence and uniqueness of the trajectories.
Proof.In the following we use the equivalent formulation of the dynamical system described in Remark 6.We show the existence and uniqueness of a strong global solution using the Cauchy-Lipschitz-Picard Theorem.To this end, we rely on [23, Proposition 6.2.1] and [7] (see Theorem 2.4, ODE (37) and the conditions (42), ( 44) and ( 45)).

Convergence of the trajectories
In the beginning of this section we will give some results, which we will use then to prove the convergence of the trajectories of the dynamical system (6).In the following the real vector space L(H) := {A : H → H : A is linear and continuous} is endowed with the norm A = sup x ≤1 Ax .
Definition 11.The map M : [0, +∞) → L(H) is said to be derivable at t 0 ∈ [0, +∞), if the limit taken with respect to the norm topology of L(H) exists.When this is the case, we denote by Ṁ(t 0 ) ∈ L(H) the value of the limit.
In case M : [0, +∞) → L(H) is derivable at t 0 ∈ [0, +∞) and x, y : [0, +∞) → H are also derivable at t 0 , we will use the following formula (see [16,Lemma 4]): We start with a result where we show that under appropriate conditions the second derivatives of the trajectories exist almost everywhere and give also an upper bound on their norms.This will be used in the proof of the main result Theorem 15.
In the following we recall two results which we need for the asymptotic analysis (see [2, Lemma 5.1] and [2, Lemma 5.2]).
In the following we have the result which states the asymptotic convergence of the trajectories generated by the dynamical system (6) to a saddle point of the Lagrangian of the problem (1).The derivation of the result via Lyapunov analysis is involved.Theorem 15.In the setting of the optimization problem (1), assume that the set of saddle points of the Lagrangian L is nonempty, the maps [0, +∞) → S + (H), t → M 1 (t), and [0, +∞) are locally absolutely continuous and monotonically decreasing in the sense of the Loewner partial ordering defined in (5), Furthermore we assume that for decreasing and Lipschitz continuous.If c(t) is a constant function, namely c(t) = c for all t ∈ [0, +∞), then it is enough to assume that ≤ c ≤ 2σ A 2 − .For an arbitrary starting point (x 0 , z 0 , y 0 ) ∈ H × G × K, let (x, z, y) : [0, +∞) → H × G × K be the unique strong global solution of the dynamical system (6).If one of the following conditions holds: 1. there exists α > 0 such that M 2 (t) − L h 2 4 I ∈ P α (G) for every t ∈ [0, +∞) 2. there exists β > 0 such that B * B ∈ P β (G); then the trajectory (x(t), z(t), y(t)) converges weakly to a saddle point of L as t → +∞.
Proof.We need an appropriate energy functional in order to conclude.This will be accomplished in (34) below.Let (x * , z * , y * ) ∈ H × G × K be a saddle point of the Lagrangian L. Then it fulfills the system of the optimality conditions From (7) we have for almost every t ∈ [0, +∞) and by taking into account the strong monotonicity of ∂ f we have In an analog way, according to (8) we have for almost every t ∈ [0, +∞) and by taking into account the monotonicity of ∂g we have We use the last equation of ( 6) and the optimality condition Ax * + Bz * = b to obtain for almost every t ∈ [0, +∞) Assume that L h 1 > 0 and L h 2 > 0. By using the Baillon-Haddad Theorem we know that the gradients of h 1 and h 2 are L −1 1 -and L −1 2 -cocoercive, respectively, we have for almost every t ∈ [0, +∞) and respectively By summing up ( 27) and ( 28) and by taking into account (29), ( 30) and (31), we obtain for almost every t ∈ [0, +∞) We have for almost every t ∈ [0, +∞) (use also the last equality for ẏ in ( 6)): By using (18) we observe that for almost every t ∈ [0, +∞) it holds By plugging the last two identities and (33) into (32), we obtain for almost every t ∈ [0, +∞) Taking into account that we obtain that ) is constant we have ċ(t) = 0) and Ṁ1 (t)(x(t) − x * ), x(t) − x * ≤ 0 and Ṁ2 (t)(z(t) − z * ), z(t) − z * ≤ 0 (which follows easily from Definition 11 and the decreasing property of M 1 and M 2 ) we have for almost every t ∈ [0, +∞) For c := and c : From Lemma 13 we have Let T > 0. By integrating (34) on the interval [0, T] we obtain Letting T converge to +∞ we have In the case when L h 1 = 0 and L h 2 > 0, we have that ∇h 1 is constant and instead of (34) we obtain for almost every t ∈ [0, +∞) Similarly, in the case when L h 1 > 0 and L h 2 = 0 we obtain for almost every t ∈ [0, +∞) and in the case when L h 1 = 0 and L h 2 = 0 we obtain for almost every t ∈ [0, +∞) By arguing as above, we obtain also in these three cases that (35) and ( 36)-(39) hold.We can easily see that, if assumptions 1. or 2. from the theorem hold true, then we have ż(•) ∈ L 2 ([0, +∞), G).Further, taking into acount the hypotheses concerning Ṁ1 , Ṁ2 and c, we can easily derive from Lemma 12 that ẍ(•) ∈ L 2 ([0, +∞), H) and z(•) ∈ L 2 ([0, +∞), G).

It follows, for almost every
and the right-hand side is a function in L 1 ([0, +∞), R).By Lemma 14 we have In the following, let us prove that each weak sequential cluster point of (x(t), z(t), y(t)), t ∈ [0, +∞) is a saddle point of L (notice that the trajectories are bounded).Let (x * , z, y) be such a weak sequentially cluster point.This means that there exists a sequence (s n ) n≥0 with s n → +∞ such that (x(s n ), z(s n ), y(s n )) converges to (x * , z, y) as n → +∞ in the weak topology of H × G × K (notice that the trajectory x(t) converges to x * strongly).
According to this fact and (39), we have v n z, u n y, Bv n → Bz = Bz * and w n → 0 as n → +∞.Due to the monotonicity of the subdifferential, we have for all (u, v) in the graph of ∂(g + h 2 ) * and for all n ≥ 0 Bv n − Bv, u n + v n − v, w n − u ≥ 0.
We let n converge to +∞ and obtain z − v, B * y − u ≥ 0 ∀(u, v) in the graph of ∂(g + h 2 ) * .
According to Theorem 15 (for L h 1 = L h 2 = 0), the generated trajectories converge weakly to a saddle point of the Lagrangian, if we choose the map c(t) as in this theorem and if there exists β > 0 such that B * B ∈ P β (G).

Conclusions and perspective
In this paper we introduced and investigated a dynamical system which generates three trajectories in order to approach the set of saddle points of the Lagrangian associated to a structured convex optimization problem with linear constraints.Under appropriate conditions we showed that the systems is well-posed.The asymptotic analysis is derived in the framework of Lyapunov analysis by finding an appropriate energy functional.The discretization of the considered dynamics is related to the Proximal AMA [12] and AMA [26] numerical schemes.
Let us mention some open questions as future research directions: (i) Investigate convergence rates for the trajectories and also for the function values along the orbits.Notice that in our setting f is strongly convex and this might induce some rates.For the AMA algorithm in [26] there are some results related to rates.
(ii) Consider second order dynamical systems in order to accelerate the convergence of the trajectories.This would induce inertial terms in the discretized counterparts of the dynamics.For optimization problems involving compositions with linear operators this is not a trivial task.We mention here the paper of Attouch [4], where the starting point is a second order dynamics with vanishing damping for monotone inclusion problems.The discretization leads to Proximal ADMM algorithms with momentum.For an accelerated AMA numerical scheme we refer to [22].
(iii) The aim would be to conduct more involved numerical experiments related to optimization problems.More precisely, consider discretizations with variable step sizes in order to derive more general numerical schemes.This, in combination with different choices of the time varying positive semidefinite operators M 1 and M 2 , could have a great impact on the theoretical results and experiments.