Proximal-gradient algorithms for fractional programming

In this paper, we propose two proximal-gradient algorithms for fractional programming problems in real Hilbert spaces, where the numerator is a proper, convex and lower semicontinuous function and the denominator is a smooth function, either concave or convex. In the iterative schemes, we perform a proximal step with respect to the nonsmooth numerator and a gradient step with respect to the smooth denominator. The algorithm in case of a concave denominator has the particularity that it generates sequences which approach both the (global) optimal solutions set and the optimal objective value of the underlying fractional programming problem. In case of a convex denominator the numerical scheme approaches the set of critical points of the objective function, provided the latter satisfies the Kurdyka-ᴌojasiewicz property.


Introduction and preliminaries
Consider the fractional programming problem where S is a nonempty subset of a real Hilbert space H, the function f is nonnegative and the function g is positive on S. One of the classical methods to handle (1) is Dinkelbach's procedure (see [1,2]) which relates it to the following optimization problem If (1) has an optimal solutionx ∈ S, then this is also an optimal solution to (2) and the optimal objective value of the latter is equal to zero. Vice-versa, if (2) hasx ∈ S as an optimal solution and its optimal objective value is equal to zero, thenx is an optimal solution to (1), too. This shows that finding an optimal solution to (1) can be approached by algorithms which solve (2). However, one drawback of this procedure is that this can be done in the very restrictive case when the optimal objective value of (1) is known. One can find in the literature (see [1][2][3][4][5]) an iterative scheme which, in the attempt to overcome this drawback in finite-dimensional spaces, requires the solving in each iteration k ≥ 0 of the optimization problem while θ k is updated by θ k+1 := g(x k ) , where x k is an optimal solution of (3). However, solving in each iteration an optimization problem of type (3) can be as expensive and difficult as solving the fractional programming problem (1).
The aim of this note is to propose an alternative to this approach. Namely, we formulate two iterative schemes for solving (1), where f : H → R is proper, convex and lower semicontinuous and g : H → R is differentiable with Lipschitz continuous gradient and either concave or convex. Instead of solving in each iteration (3), the proposed iterative methods perform a gradient step with respect to g and a proximal step with respect to f . In this way, the functions f and g are processed separately in each iteration. A further advantage of the algorithm investigated in case g is concave comes from the fact that it generates sequences that concomitantly approach the set of optimal solutions and the optimal objective value of (1). The second numerical scheme, proposed in case g is convex, has the particularity that it approaches the set of critical points of the objective function of (1), provided the latter satisfies the Kurdyka-Łojasiewicz property.
For the notations used in this paper we refer the reader to [6][7][8][9]. Let H be a real Hilbert space with inner product ·, · and associated norm · = √ ·, · . The symbols and → denote weak and strong convergence, respectively.
For a function f : Let S ⊆ H be a nonempty set. The indicator function of S, δ S : H → R, is the function which takes the value 0 on S and +∞ otherwise.
An efficient tool for proving weak convergence of a sequence in Hilbert spaces (without a priori knowledge of its limit) is the Opial Lemma, which we recall in the following. Lemma 1: [Opial] Let C be a nonempty set of H and (x k ) k≥0 be a sequence in H such that the following two conditions hold: (a) for every x ∈ C, lim k→+∞ x k − x exists; (b) every weak sequential cluster point of (x k ) k≥0 is in C; Then (x k ) k≥0 converges weakly to an element in C.
When proving the first part of the Opial Lemma, one usually tries to show that for every x ∈ C the sequence ( x k − x ) k≥0 fulfils a Fejér-type inequality. In this sense, the following result is very useful.
The following summability result will be useful in Section 2.2. Lemma 3: Let (a k ) k≥0 and (ε k ) k≥0 be nonnegative real sequences, such that k≥0 ε k < +∞ and a k+1 ≤ a · a k + ε k for every k ≥ 0, where a ∈ R, a < 1. Then k≥0 a k < +∞.
Finally, the descent lemma which we recall next is a helpful tool in the convergence analysis of the algorithms proposed in this manuscript.

Two proximal-gradient algorithms
In this section, we propose two proximal-gradient algorithms for solving (1) and investigate their convergence properties. We treat the situations when g is either a convex or a concave function separately.

Concave denominator
The problem that we investigate throughout this subsection has the following formulation. Problem 5: We are interested in solving the fractional programming problem where H is a real Hilbert space, S is a nonempty, convex and closed subset of H, g(x) = ∅ and the following conditions hold: To this aim we propose the following algorithm.

Algorithm 6:
Initialization: Choose x 0 ∈ S ∩ dom f and set θ 1 := g(x k ) . We are now in position to present the convergence statement of this algorithm. To this end, we assume that the algorithm does not stop after finitely many iterations. Theorem 7: In the setting of Problem 5, consider the sequences generated by Algorithm 6. The following statements hold: g(x) > 0. Then the sequence (x k ) k≥0 converges weakly to an element in C.

Proof:
According to the first-order optimality conditions, we have A direct consequence of the definition of the convex subdifferential is the inequality Invoking the concavity of g and using that θ k ≥ 0, we have Combining (7) and (8), we obtain Lemma 4 applied to the function −g yields the inequality Taking into account the relation f (x k ) = θ k+1 g(x k ) and the way η k is defined, we obtain for every x ∈ S and k ≥ 1 the inequality This further implies that (θ k ) k≥1 is a nonincreasing sequence, hence convergent, since it is bounded from below by 0. Consider now an arbitrary x ∈ C and take x := x in (10). We derive This yields the inequality Since (θ k ) k≥1 is bounded from below by 0, the sequence on the right-hand side of inequality (13) belongs to 1 . We derive from Lemma 2 that and Coming back to (12) and using θ k ≥ θ k+1 , we obtain (14) and (15) and the convergence of the sequence (θ k ) k≥1 , the right-hand side of the above inequality is a sequence which converges to 0 as k → +∞. Invoking also the fact that (θ k ) k≥1 is bounded from below by θ , we conclude that lim k→+∞ θ k = θ . Let us prove now the convergence rate result stated in (5). Let x ∈ C and n ≥ 1 be arbitrary.
Noticing the telescoping sum in the left-hand side of the previous inequality, we obtain Summing up the inequalities in (16) for k from 1 to n + 1 we obtain Summing up the inequalities (17) and (18) and discarding the nonnegative terms on the right-hand side we derive Noticing that θ 1 ≥ θ, the last inequality implies (5) after rearranging the terms. (ii) For the remaining of the proof we assume that inf x∈S g(x) > 0. In this situation, lim k→+∞ θ k = θ > 0 and from (14) and (15) and Thus the first condition in the Opial Lemma is fulfilled. Consider now a subsequence (x k l ) l≥0 of (x k ) k≥0 that weakly converges to x as l → +∞. From due to the concavity of g and θ > 0. Since for every l ≥ 1 we have (19) and the fact that (x k l ) l≥0 is bounded, we conclude that Since (x k l ) l≥0 weakly converges to x as l → +∞, from (21) and the fact that the graph of the convex subdifferential of a proper, convex and lower semicontinuous function is sequentially closed with respect to the weak-norm topology (see [7,Proposition 20.33]), we derive that The definition of the convex subdifferential yields the inequality From here, by choosing y ∈ C, we get Relation (22) implies now that x ∈ C. Thus the second condition in the Opial Lemma is also fulfilled. The conclusion follows now from Lemma 1.

Convex denominator
In this subsection we consider the case when g is a convex function.

Problem 8:
We are interested in solving the fractional programming problem where H is a real Hilbert space, S is a nonempty, convex and closed subset of H, and the following conditions hold: The algorithm we propose in this context has the following formulation. Algorithm 9: g(x k ) . The proof of the first result in this subsection reveals the fact that when g is convex one cannot expect convergence of the whole sequence (x k ) k≥0 . Furthermore, if this is the case, then the limit is not necessarily an optimal solution of (23), but a critical point of the objective function f +δ S g in the sense of the limiting subdifferential. In order to explain this notion, we need some prerequisites of nonsmooth analysis.
For the following generalized subdifferential notions and their basic properties, we refer to [11,12]. Let h : H → R ∪ {+∞} be a proper and lower semicontinuous function. If x ∈ dom h, we consider the Fréchet (viscosity) subdifferential of h at x as being the set For x / ∈ dom h we set∂h(x) := ∅. The limiting (Mordukhovich) subdifferential is defined at x ∈ dom h by Notice that in case h is convex, these two subdifferential notions coincide with the convex subdifferential, thuŝ The Fermat rule reads in this nonsmooth setting: if x ∈ H is a local minimizer of h, then 0 ∈ ∂ L h(x). An element x ∈ dom h fulfilling this inclusion relation is called critical point of the function h. The set of all critical points of h is denoted by crit (h).
The convergence of Algorithm (9) is stated in the following theorem. Theorem 10: In the setting of Problem 8, consider the sequences generated by Algorithm 9 such that the additional condition lim inf k→+∞ η k > 0 is satisfied. The following statements hold: Proof: As already seen in the proof of Theorem 7, we have and By choosing x := x k−1 in (25) we obtain Further, by combining this with we obtain (i) From (26) we obtain that (θ k ) k≥1 is nonincreasing, hence convergent, since it is bounded from below by 0. Moreover, from (26) we obtain Without losing the generality, we assume that x k → x as k → +∞. Since S is closed, we have x ∈ S. By choosing x := x in (25), we obtain By using (i), one can see that the right-hand side of the above inequality converges to 0 as k → +∞. Hence, lim sup k→+∞ f (x k ) ≤ f (x). Since f is lower semicontinuous, the reverse inequality is also true, thus lim Furthermore, due to the continuity of g, we have Let us denote by θ the limit of the sequence (θ k ) k≥1 . Passing to the limit as k → +∞ in the relation which defines θ k+1 in Algorithm 9, we obtain By using again the closedness property of the graph of the convex subdifferential, from (24) and (i) we obtain In this situation f + δ S is Lipschitz continuous around x (see [7,Theorem 8.29]). From (27) and (28) we obtain where the last equality makes use of [11, Corollary 1.111(i)].

Remark 11:
(a) The main ingredient in the proof of the second statement of the above theorem is the rule for the limiting subdifferential of the product (or quotient) of locally Lipschitz continuous functions. We notice that similar rules are valid also for the Clarke subdifferential (see [13,Exercise 10.21]). (b) Whenever f +δ S g is a convex function, we obtain in the hypotheses of the above theorem that x is a global optimal solution of (23) and lim k→+∞ θ k = In the remaining of this subsection, we address the question whether one can guarantee the convergence of the whole sequence (x k ) k≥0 generated in Algorithm 9. We will see that this is ensured whenever the objective function of (23) satisfies the Kurdyka-Łojasiewicz property. To this end we recall some notations and definitions related to the latter.

Definition 12 (Kurdyka-Łojasiewicz property):
Let h : H → R be a proper and lower semicontinuous function. We say that h satisfies the Kurdyka-Łojasiewicz (KL) property at x ∈ dom ∂ L h = {x ∈ H : ∂ L h(x) = ∅} if there exists η ∈ (0, +∞], a neighbourhood U of x and a function ϕ ∈ η such that for all x in the intersection If h satisfies the KL property at each point in dom ∂h, then h is called a KL function. The origins of this notion go back to the pioneering work of Łojasiewicz [16], where it is proved that for a real-analytic function h : H → R and a critical point x ∈ H (that is ∇h(x) = 0), there exists θ ∈ [1/2, 1) such that the function |h − h(x)| θ ∇h −1 is bounded around x. This corresponds to the situation when ϕ(s) = Cs 1−θ , where C > 0. The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [17] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [14,[18][19][20].
One of the remarkable properties of KL functions is their ubiquity in applications, according to [15]. To this class of functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [14,15,[18][19][20][21][22] and the references therein for more details regarding KL functions and illustrating examples.
An important role in our convergence analysis will be played by the following uniformized KL property given in [15,Lemma 6]. Lemma 13: Let ⊆ H be a compact and connected set and let h : H → R be a proper and lower semicontinuous function. Assume that h is constant on and h satisfies the KL property at each point of . Then there exist ε, η > 0 and ϕ ∈ η such that for all x ∈ and for all x in the intersection the following inequality holds The techniques used below are well-known in the community dealing with algorithms for optimization problems involving functions with the Kurdyka-Łojasiewicz property (see [15,21,23,24]). We show that this approach can be used also for fractional programming problems.
In the following, we denote by ω((x k ) k≥0 ) the set of cluster points of the sequence (x k ) k≥0 . The first statement in the next result is a direct consequence of Theorem 10, while the other statements can be proved similar to [15,Lemma 5], where it is noticed that (b) and (c) are generic for sequences satisfying the relation lim k→+∞ (x k − x k−1 ) = 0. Lemma 14: In the setting of Problem 8, let H be finite dimensional and consider the sequences generated by Algorithm 9 such that the additional condition is satisfied. Assume that (x k ) k≥0 is bounded. The following statements hold:

Remark 15: Suppose that
Then the sequence (x k ) k≥0 generated by Algorithm 9 is bounded. Indeed, this follows from the fact that (θ k ) k≥1 is nonincreasing and the lower level sets of f +δ S g are bounded. We give now the main result concerning the convergence of the whole sequence (x k ) k≥0 . Theorem 16: In the setting of Problem 8, let H be finite dimensional, ∇g be L-Lipschitz continuous, and consider the sequences generated by Algorithm 9 under the additional conditions lim inf k→+∞ η k > 0, η 1 θ 1 < 1 L and (η k ) k≥1 nonincreasing. Assume that f +δ S g is a KL function. Moreover, suppose that (x k ) k≥0 is bounded and there exists k 0 ≥ 0 such that x k ∈ int ( dom f ∩ S) for all k ≥ k 0 . Then the following statements are true:

Proof:
(a) Consider the sequences generated by Algorithm 9. According to Lemma 14 we can choose an element x ∈ ω((x k ) k≥0 ). By Theorem 10(ii), we have x ∈ dom f ∩ S and lim k→+∞ θ k = f (x) g(x) . We separately treat the following two cases. (I) There exists k ≥ 1 such that for every k ≥ k. By using (26), we deduce that the sequence (x k ) k≥k is constant. From here the conclusion follows automatically.
Altogether, from (35) we obtain for every k ≥ max{k, k 0 } and from here Further, we observe that Moreover, (∇g(x k )) k≥0 is bounded and lim sup k→+∞ g(x k ) < +∞, due to g(x k+1 ) ≤ M (M > 0) and lim inf k→+∞ g(x k ) > 0, which follows from the continuity of g, the fact that (x k ) k≥0 is bounded and Theorem 10(ii). Thus there exist some positive constants C 1 , C 2 > 0 and k ≥ 0 such that The conclusion follows from Lemma 3 by noticing that (θ k ) k≥1 and ϕ are bounded from below. (b) It follows from (a) that (x k ) k≥0 is a Cauchy sequence, hence it is convergent. The conclusion follows from Theorem 10.

Future work
We point out some open questions to be followed in the future related to the solving of the fractional programming problem under investigation: (1) Is it possible to evaluate in each iteration the functions f and δ S separately, which would actually mean that the set S is addressed in the algorithm by means of its projection operator? (2) How to incorporate in Algorithm 6 some extrapolation terms in the sense of Nesterov in order to improve its speed of convergence? (3) Can one consider also other situations, for instance when f is smooth and g is nonsmooth, or even the more general case where both functions are nonsmooth?

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The work of the second author was supported by FWF (Austrian Science Fund), Lise Meitner Programme, project [grant number M 1682-N25].