A second-order dynamical approach with variable damping to nonconvex smooth minimization

ABSTRACT We investigate a second-order dynamical system with variable damping in connection with the minimization of a nonconvex differentiable function. The dynamical system is formulated in the spirit of the differential equation which models Nesterov's accelerated convex gradient method. We show that the generated trajectory converges to a critical point, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. We also provide convergence rates for the trajectory formulated in terms of the Lojasiewicz exponent.

The study of the system (2) is motivated by the recent developments related to the approaching of the solving of convex optimization problems from a continuous perspective.
In [1], Su, Boyd and Candes proposed the following dynamical system x(t) + α tẋ (t) + ∇g(x(t)) = 0, as the continuous counterpart of the Nesterov's accelerated gradient method [see [2]] for minimizing g in the convex case. This research has been deepened by Attouch and his co-authors [see [3,4]], who proved that, if α > 3, then the generated trajectory x(t) converges to a minimizer of g as t → +∞, while the convergence rate of the objective function along the trajectory is o(1/t 2 ). The convergence of the trajectory is actually the continuous counterpart of a result due to Chambolle and Dossal [see [5]], which proves the convergence of the iterates of the modified FISTA algorithm [see [6]]. Recently, in [7], investigations have been performed concerning the convergence rate of the objective function along the trajectory in the subcritical case α ≤ 3, while some open questions related to the asymptotic properties of the trajectory have been formulated.
In this manuscript, we carry out, in the nonconvex setting, an asymptotic analysis of the dynamical system (2), which can be seen as a perturbation of the dynamical system (3) that models Nesterov's accelerated gradient method in the convex case. To the best of our knowledge, this is the first contribution addressing second-order dynamical systems with variable damping associated to nonconvex optimization problems. We show that the generated trajectory converges to a critical point of g as t → +∞, provided the following regularization of g, H : R n × R n −→ R, H(u, v) = g(u) + 1 2 u − v 2 , satisfies the Kurdyka-Łojasiewicz inequality. Moreover, we derive convergence rates in the terms of Łojasiewicz exponent, for the trajectory, its velocity and its acceleration. One of the major future goals is to study the asymptotic properties of the system (2) in case γ = 0. For other investigations of the asymptotic analysis of second-order dynamical systems with time-dependent damping, we refer to the papers of Haraux and Jendoubi [8] and Balti [9].
For α = 0, the convergence of the trajectory generated by (2) to a critical point of g has been shown by Bégout, Bolte and Jendoubi in [10] in the hypothesis that g is of class C 2 and it satisfies the Kurdyka-Łojasiewicz property with a desingularizing function satisfying a restrictive condition [see also the papers of Haraux and Jendoubi [11] and Chill and Jendoubi [12]]. On the other hand, the dynamical system (2) is, for α = 0, a particular instance of the second-order dynamical system of proximal-gradient type studied in [13].
The following numerical scheme, with starting points x 0 , x 1 ∈ R n , where s ≤ 1/L g is the step size, can be seen as a discrete counterpart of (2). One can notice that for γ = 0, this iterative scheme algorithm is similar to Nesterov's accelerated convex gradient method. In the following, we prove that (2) can be seen in an informal way as the exact limit of (4)). We take to this end in (4) small step sizes and follow the same approach as Su, Boyd and Candes in [1,Section 2]. For this purpose, we rewrite (4) in the form and introduce the Ansatz x k ≈ x(k √ s) for some twice continuously differentiable function x : Then, as the step size s goes to zero, from the Taylor expansion of x, we obtain and Consequently, (5) can be written aṡ or, equivalently Hence, After dividing by √ s and letting s → 0, we obtain which, after division by t, gives (2), namelÿ

Existence and uniqueness of the trajectory
We consider on the finite-dimensional space R n the Euclidean topology. If x ∈ R n is a local minimizer of g, then ∇g(x) = 0. We denote by crit(g) = {x ∈ R n : ∇g(x) = 0} the set of critical points of g. We are considering in the asymptotic analysis of the dynamical system (2) strong global solutions.

Definition 2.1:
We say that x : [t 0 , +∞) → R n is a strong global solution of (2), if the following properties are satisfied: (i) x,ẋ : [t 0 , +∞) → R n are locally absolutely continuous, in other words, absolutely continuous on each interval [t 0 , T] for t 0 < T < +∞; Recall that a function x : [t 0 , +∞) → R n is absolutely continuous on an interval [t 0 , T], if there exists an integrable function y : [t 0 , T] → R n such that It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivativeẋ = y by the integration formula above. On the other hand, if x : [t 0 , T] → R n (where T > t 0 ) is absolutely continuous and B : R n → R n is L-Lipschitz continuous (where L ≥ 0), then the function B • x is absolutely continuous, too. Moreover, B • x is almost everywhere differentiable and the inequality (d/dt)B(x(t)) ≤ L ẋ(t) holds for almost every t ≥ t 0 [see [14,15]].
We prove existence and uniqueness of a strong global solution of (2) by making use of the Cauchy-Lipschitz-Picard Theorem for absolutely continues trajectories [see for example [16, Proposition 6.2.1], [17,Theorem 54]]. The key argument is that one can rewrite (2) as a particular first-order dynamical system in a suitably chosen product space [see also [18]]. Theorem 2.1: For every starting points u 0 , v 0 ∈ R n there exists a unique strong global solution of the dynamical system (2).
Proof: By making use of the notation X(t) = (x(t),ẋ(t)), the system (2) can be rewritten as a firstorder dynamical system:Ẋ where F : Obviously, the Lipschitz constant function t → L(t) := 1 + 2L 2 The Cauchy-Lipschitz-Picard Theorem guarantees existence and uniqueness of the trajectory of the first-order dynamical system (6) and thus of the second-order dynamical system (2).
The next result shows that the acceleration of the trajectory generated by (2) is also locally absolutely continuous on [t 0 , +∞).

Proposition 2.1:
For the starting points u 0 , v 0 ∈ R n , let x be the unique strong global solution of (2). Thenẍ is locally absolutely continuous on [t 0 , +∞), hence the third-order derivative x (3) exists almost everywhere on [t 0 , +∞). where Let be ε > 0. Since the functionsẋ(·), t → α/t and x(·) are absolutely continuous on [t 0 , T], there exists η > 0 such that for any finite family of intervals is absolutely continuous on [t 0 , T], which shows thatẍ is absolutely continuous [t 0 , T]. This proves thatẍ is locally absolutely continuous on [t 0 , +∞), which means that the third-order derivative x (3) exists almost everywhere on [t 0 , +∞).
The following results provides an estimate for the third-order derivative of the strong global solution of the dynamical system (2) in terms its first-and second-order derivatives.

Lemma 2.1:
For the starting points u 0 , v 0 ∈ R n , let x be the unique strong global solution of (2). Then, for almost every t ∈ [t 0 , +∞), it holds Proof: Let t ∈ [t 0 , +∞) be such thatẊ(t) = F(t, X(t)). We have for almost every h > 0 that Hence, By taking the limit as h → 0, we obtain Since

Remark 2.1: For
we have that for almost every t ∈ [t 0 , +∞).

Convergence of trajectories
In this section, we study the convergence of the trajectory of the dynamical system (2). We denote by ω(x) := {x ∈ R n : ∃t k −→ +∞ such that x(t k ) −→ x as k −→ +∞} the set of limit points of the trajectory x.
Before proving a first result in this sense, we recall two technical lemmas which will play an essential role in the asymptotic analysis. Lemma 3.1: Suppose that F : [0, +∞) → R is locally absolutely continuous and bounded below and that there exists G ∈ L 1 ([0, +∞)) such that for almost every t ∈ [0, +∞) Then there exists lim t→+∞ F(t) ∈ R.

Theorem 3.1:
Assume that g is bounded from below and, for u 0 , v 0 ∈ R n , let x be the unique strong global solution of the dynamical system (2). Then the following statements are true: (ii) there exists β > 0 such that the limit lim t−→+∞ g(βẋ(t) + x(t)) exists and is finite; Proof: Choose 0 < β < min(2/L g , ( L 2 g + 2γ L g − L g )/L g ). By using the L g -Lipschitz continuity of ∇g, for almost every t ∈ [t 0 , +∞) it holds Taking into account that we obtain for almost every t ∈ [t 0 , +∞) Since 0 < β < min(2/L g , ( L 2 g + 2γ L g − L g )/L g ), there exists t 1 > 0 such that for every t ≥ t 1 it holds For simplicity, we denote Furthermore, by the L g -Lipschitz property of ∇g, it holds ∇g ∈ L 2 ([t 1
In the following result, we use the distance function to a set, defined for A ⊆ R n as dist(x, A) = inf y∈A x − y for all x ∈ R n . The following result is a direct consequence of Theorem 3.1.

Lemma 3.3:
Assume that g is bounded from below and, for u 0 , v 0 ∈ R n , let x be the unique strong global solution of the dynamical system (2). Define Let be 0 < β < min(2/L g , ( L 2 g + 2γ L g − L g )/L g ) and t 1 > 0 such that for every t ≥ t 1 the inequalities (9) hold. For every t ∈ [t 1 , +∞), define Then the following statements are true:
Let be (x n ) n≥1 ⊆ ω(x) and assume that lim n−→+∞ x n = x * . We show that x * ∈ ω(x). Obviously, for every n ≥ 1, there exists a sequence t n k −→ +∞, k −→ +∞, such that Let be > 0. Since lim n−→+∞ x n = x * , there exists N( ) ∈ N such that for every n ≥ N( ) it holds Let n ≥ 1 be fixed. Since lim k−→+∞ x(t n k ) = x n , there exists k(n, ) ∈ N such that for every k ≥ k(n, ) it holds Let be k n ≥ k(n, ε) such that t n k n > n. Obviously t n k n −→ ∞ as n −→ +∞ and for every n ≥ N( ) Hence, (viii) follows from the definition of the set ω(u, v). Remark 3.1: Combining (iii) and (iv) in Lemma 3.3, it follows that for every x ∈ ω(x) and t k −→ +∞ such that The convergence of the trajectory generated by the dynamical system (2) will be shown in the framework of functions satisfying the Kurdyka-Łojasiewicz property. For η ∈ (0, +∞], we denote by η the class of concave and continuous functions ϕ : [0, η) → [0, +∞) such that ϕ(0) = 0, ϕ is continuously differentiable on (0, η), continuous at 0 and ϕ (s) > 0 for all s ∈ (0, η).

Definition 3.1 (Kurdyka-Łojasiewicz property):
Let f : R n → R be a differentiable function. We say that f satisfies the Kurdyka- Łojasiewicz (KL) property at x ∈ R n if there exist η ∈ (0, +∞], a neighbourhood U of x and a function ϕ ∈ η such that for all x in the intersection the following inequality holds If f satisfies the KL property at each point in R n , then f is called a KL function. The origins of this notion go back to the pioneering work of Łojasiewicz [19], where it is proved that for a real-analytic function f : R n → R and a critical point x ∈ R n (that is ∇f (x) = 0), there exists θ ∈ [1/2, 1) such that the function |f − f (x)| θ ∇f −1 is bounded around x. This corresponds to the situation when ϕ(s) = C(1 − θ) −1 s 1−θ . The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [20] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [21][22][23][24].
To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [21][22][23][24][25][26][27] and the references therein for more details regarding all the classes mentioned above and illustrating examples.
An important role in our convergence analysis will be played by the following uniformized KL property given in [27,Lemma 6].

Lemma 3.4:
Let ⊆ R n be a compact set and let f : R n → R be a differentiable function. Assume that f is constant on and f satisfies the KL property at each point of . Then there exist , η > 0 and ϕ ∈ η such that for all x ∈ and for all x in the intersection the following inequality holds The following theorem is the main result of the paper and concerns the global asymptotic convergence of the trajectory generated by (2).

Theorem 3.2:
Assume that g is bounded from below and, for u 0 , v 0 ∈ R n , let x be the unique strong global solution of (2). Suppose that is a KL function and x is bounded. Then the following statements are true: Proof: Let be 0 < β < min(2/L g , ( L 2 g + 2γ L g − L g )/L g ) and t 1 > 0 such that for every t ≥ t 1 the inequalities (9) hold. Furthermore, we will use the notations made in Lemma 3.3, according to which we can choose (x,x) ∈ ω(u, v). It holds H(x,x).
On the other hand, Since A < 0 and B(t) < 0 for every t ≥ t 1 , the latter inequality can hold only iḟ x(t) =ẍ(t) = 0 for almost every t ≥ t.

By integrating on the interval [T, T], for T > T, we obtain
Since ϕ is bounded from below, A < 0, B(s) < 0 and p(s) > 0 for every s ≥ T and T was arbitrarily chosen, we obtain that which leads to and further to Since p is bounded from above on [t 0 , +∞) and we obtain thatẋ ,ẍ ∈ L 1 ([t 0 , +∞), R n ).

Remark 3.3:
Since the class of semi-algebraic functions is closed under addition [see, for example, [27]] and (x, y) → 1 2 x − y 2 is semi-algebraic, the conclusion of the previous theorem holds, if, instead of asking that H is a KL function, we ask that g is semi-algebraic.

Remark 3.4:
Assume that g is coercive, that is For u 0 , v 0 ∈ R n , let x ∈ C 2 ([0, +∞), R n ) be the unique global solution of (2). Then x is bounded.
Indeed, notice that g is bounded from below, being a continuous and coercive function [see [28]]. From (10), it follows that βẋ(T) + x(T) is contained for every T ≥ t 1 in a lower level set of g, which is bounded. According to Theorem 3.1, βẋ(t) −→ 0, t −→ +∞, which implies that x is bounded.

Convergence rates
In this section, we will assume that the regularized function H satisfies the Lojasiewicz property, which, as noted in the previous section, corresponds to a particular choice of the desingularizing function ϕ [see [19,22,25]]. Definition 4.1: Let f : R n −→ R be a differentiable function. The function f is said to fulfil the Łojasiewicz property, if for every x ∈ crit f there exist K, > 0 and θ ∈ (0, 1) such that The number θ is called the Łojasiewicz exponent of f at the critical point x.
In the following theorem, we provide convergence rates for the trajectory generated by (2), its velocity and acceleration in terms of the Łojasiewicz exponent of H [see, also, [22,25]].