Using Nemirovski’s Mirror-Prox method as basic procedure in Chubanov’s method for solving homogeneous feasibility problems

We introduce a new variant of Chubanov’s method for solving linear homogeneous systems with positive variables. In the Basic Procedure we use a recently introduced cut in combination with Nemirovski’s Mirror-Prox method. We show that the cut requires at most O ( n 3 ) time, just as Chubanov’s cut. In an earlier paper it was shown that the new cuts are at least as sharp as those of Chubanov. Our Modified Main Algorithm is in essence the same as Chubanov’s Main Algorithm, except that it uses the new Basic Procedure as a subroutine. The new method has O ( n 3.5 log 2 ( 1 /δ A )) time complexity, where δ A is a suitably defined condition number. As we show, a simplified version of the new Basic Procedure competes well with the Smooth Perceptron Scheme of Peña and Soheili and, when com-bined with Rescaling, also with two commercial codes for linear optimization.


Introduction
We deal with the (primal) problem find x ∈ R n subject to Ax = 0, x > 0, (1) where A is an integer (or rational) matrix of size m × n and rank (A) = m. The dual problem is find w ∈ R m subject to A T w ≥ 0, A T w = 0. (2) if the system is feasible. A key ingredient in Chubanov's algorithm is the so-called Basic Procedure (BP). As a result of the BP one gets either (i) a feasible solution of (1), or (ii) a feasible solution for the dual problem (2) of (1), or (iii) a cut for the feasible region of (3).
The cut in (iii) has the form x k ≤ 1 2 , for some index k, for all feasible solutions of (3). If this happens, the cut is used to rescale the matrix A. The rescaling happens in Chubanov's Main Algorithm (MA). The MA maintains a vector d such that x ≤ d for all x satisfying (3). Initially we take d = 1. In case (iii) the MA multiplies d k with 1 2 and replaces A by AD, where D = diag (d). Then the MA sends the rescaled matrix to the BP. This is repeated until the BP returns (i) or (ii). If the BP yields (ii) then (1) is infeasible.
Following [9] we use the condition number δ A , defined by Obviously, 0 ≤ δ A ≤ 1. Moreover, since x ≤ d for all feasible solutions of (3), δ A ≤ d 1 d 2 . . . d n . The number of iterations of the MA can now be easily expressed in δ A , because in case (iii) the MA reduces at least one of the entries in d with the factor 2. Hence, after k iterations of the MA we will have δ A ≤ 2 −k . As a consequence, log 2 (1/δ A ) provides an upper bound for the number of iterations of the MA. The BP in [4] needs at most 4n 3 iterations per call and O(n) time per iteration, in total O(n 4 ) time per call. So the overall time complexity becomes O(n 4 log 2 (1/δ A )).
In [11,12] some improvements of Chubanov's method and its analysis were presented. One improvement was the introduction of a new type of cut. It was shown in [12] that the new cuts are at least as tight as the cuts used by Chubanov. As just mentioned, a Chubanov cut is generated in at most 4n 3 iterations, whereby an iteration requires O(n) time. The new cuts are also generated in O(n 3 ) iterations, but this is much more difficult to understand. It was already claimed in a lemma in [12], but unfortunately the proof of this lemma is wrong.
The first result of this paper is in Section 4, where we prove a slightly modified version of [12,Lemma 4.2]. It implies that when the new cuts are used in Chubanov's BP at most n 3 iterations are needed to generate such a cut.
It may be useful to point out that Chubanov's BP is in essence nothing else than an algorithm that in itself is able to solve the problem. In fact it is closely related to the well-known Von Neumann algorithm, having exactly the same convergence rate as that algorithm [5]. The idea underlying Chubanov's method is that such a method can be improved by using information that becomes available during the execution of the algorithm to improve the condition of the underlying matrix A, by rescaling the columns of this matrix. The second result is that Nemirovski's Mirror-Prox method [8] can be used in this way. It generates a cut in only 2n √ n iterations, with each iteration requiring O(n 2 ) time. So each call of the new BP needs only O(n 3.5 ) time. As a result we obtain that the MA -which is in essence the same as in [4] -finds a solution of either (1) or (2) in O(n 3.5 log 2 (1/δ A )) time. The outline of the paper is as follows. In Section 2 we introduce the notions inducing primal and inducing dual feasibility, for any vector y ∈ R n . We recall in Section 3 how cuts were obtained in [12], as well as some properties of these cuts. The main result in Section 4 is Proposition 4.1; it guarantees that Chubanov's BP generates the new cuts in n 3 iterations.
As a preparation for Section 6, where we present the simple version of Nemirovski's Mirror-Prox method that we use in our BP, we derive in Section 5 the saddle point problem that we want to solve. In Section 7 we present the new BP and its analysis, and in Section 8 the MA. Finally, Section 9 contains some computational results. We compare the method presented in this paper with the smooth perceptron BP presented in [9] and also with the linear optimization codes in two commercial packages: Gurobi [14] and Mosek [1,15]; these packages are freely available for academic use. We conclude with some comments in Section 10.

Preliminaries
As usual R denotes the set of real numbers and R + the set of nonnegative real numbers. The all-one vector in R n is denoted as 1 and the n × n identity matrix as I n . The null space of matrix A is denoted as L and L ⊥ denotes the row space of A. So Since rank (A) = m, the inverse of AA T exists. Hence the orthogonal projections P A and Q A of R n onto L and L ⊥ are, respectively, given by For any y ∈ R n we use the notation So y L and y L ⊥ are the orthogonal components of y in the spaces L and L ⊥ , respectively: Lemma 2.1. Let y ∈ R n . If y L > 0 then y L solves the primal problem (1) and if 0 = y L ⊥ ≥ 0 then y L ⊥ gives rise to a solution of the dual problem (2) in O(n 3 ) time.
Proof: Since y L is the projection of y into the null space of A we have Ay L = 0. Hence the first statement in the lemma is obvious. The second statement follows by noting that y L ⊥ ∈ L ⊥ implies y L ⊥ = A T w for some w, thus yielding a solution w of (2). Since A has full row rank, w is uniquely determined by y L ⊥ and can be computed from y L ⊥ in O(n 3 ) time.
Because of Lemma 2.1 it becomes natural to say that a vector y induces primal feasibility if y L > 0 and that y induces dual feasibility if 0 = y L ⊥ ≥ 0. If y does not yield a solution of the primal or the dual problem, it is modified in Chubanov's BP until it induces primal or dual feasibility, or can be used to generate a cut. In the next section we discuss how cuts can be obtained from a vector y, as proposed in [12].

Cuts and cutting vectors
While Chubanov used the vector y L to construct cuts for (3), we showed in [12] that by using the vector y L ⊥ one gets cuts that are at least as tight as the cuts used by Chubanov. Next we recall how this goes.
We introduce the following notations. Let v = y L ⊥ . The vector that arises from v by replacing all its negative entries by zero is denoted as  (3) and v ∈ L ⊥ . Then every nonzero entry v k in v gives rise to an upper bound for x k , according to After dividing both sides by −v k we obtain that x k is bounded from above by the sum of the positive entries in the vector −v/v k . If v k > 0 a similar argument yields the same upper bound for x k . This proves the lemma.
If x is feasible for (3) then we already know that x k ≤ 1, for each k. Therefore we say that the cut (5) is void if σ k (v) ≥ 1. Of course, we are only interested in the nonvoid cases. The following lemma is in the same spirit as Corollary 2.3 in [12], but stronger.
Proof: First consider the case where v k < 0. In that case σ k (v) < 1 holds if and only if 1 T v + < −v k . Therefore, we may write This proves the lemma.
Below it is always assumed that v = y L ⊥ for some y. We define Proof: Let v be as in the lemma and σ (v) = 0. Then σ k (v) = 0 for some k, by (6). By Lemma 3.1 this implies x k = 0 for every x such that Ax = 0. This means that (1) is infeasible, proving the lemma.
We call y a cutting vector if σ (v) < 1, and a proper cutting vector if σ (v) < 1 2 . If y induces primal feasibility then y L > 0. Since y L = P A y L we may restrict our search for a cutting vector to nonnegative vectors y. Due to homogeneity we may further assume 1 T y = 1. We conclude this section with a slight modification of a result from [12] that provides a sufficient condition for y being a cutting vector.
As mentioned in the Introduction, the proof of the original lemma in [12] is wrong. In the next section we provide a proof of Lemma 3.4. As is known, each iteration of Chubanov's BP increases 1/ y L 2 with at least 1 [4,11]. Therefore, when Chubanov's BP is equipped with the new cuts, after n 3 iterations (7) certainly holds. It follows that n 3 will be an upper bound for the number of iterations, despite the fact that the new cut is usually tighter than Chubanov's cut and never less tight [11, Section 2.2].

Sufficient condition for cutting vectors
In this section L denotes a fixed linear space in R n . Moreover, the vector y will be such that y ≥ 0, 1 T y = 1, and z and v are defined by z = y L , v = y L ⊥ . Note that if y is not a proper cutting vector, then Lemma 3.4 states that 1 < n 3 z 2 . More generally, we have the following result.
For the proof of this proposition we need a couple of lemmas. We have with σ (v) ≥ 1 2 . The latter implies n ≥ 2, because σ (v) > 0 can hold only if v has entries with different signs. In order to derive a lower bound for z we consider the problem min y,z∈R n , β∈R with the vector v fixed. We introduced an additional variable β because if β = 1, as in (8), then problem (9) may be infeasible. This can be understood by noting that if z and v form an orthogonal decomposition of y, as in (8), then v ≤ y . Since 1 T y = 1 and y ≥ 0 we have y ≤ 1. So, if v > 1, the system (8) will be infeasible. 1 The proof below of Proposition 4.1 depends on the fact that we can solve problem (9) analytically. We start with two simple lemmas. In order to proceed we need the dual problem of (9), which is given by This problem is strictly feasible (take λ = 0, α = −1). The dual objective value is bounded above (α < 1, because if α ≥ 1 then λ ≥ 1, whence λ ≥ √ n). These two properties (i.e. strict feasibility and boundedness) imply that (9) is solvable (i.e. has an optimal solution) and that the optimal values of (9) and (10) coincide [2, Thm. 2.4.1]. In the same way it follows that the dual problem is solvable. Hence, there exist optimal solutions (y, z, β) and (λ, α) of (9) and (10), respectively, and z = α. Moreover, by Lemma 4.2, z > 0, whence also α > 0. From now on we always assume that the triple (y, z, β) is primal optimal and the pair (λ, α) dual optimal.
Due to the Cauchy-Schwarz inequality and the feasibility conditions we may write Since z = α, the three inequalities in (11) hold with equality. Thus we obtain Since z = 0, by Lemma 4.2, the first equality in (12) implies λ = 1 and then the second equation implies λ = z/ z , whence we obtain We derive from λ = 1 and the Cauchy-Schwarz inequality that Proof: (14), we may write Therefore, α √ n ≤ 1. Since z = α, the first statement in the lemma follows. Next we deal with the second statement. It can be restated as Since λ is dual feasible, we have λ T v = 0, and hence also 1 T v = 0. On the other hand, if 1 T v = 0 then y = z + βv and 1 T y = 1 imply 1 T z = 1. This implies 1 z ≥ 1, whence n z 2 ≥ 1. Due to the first statement in the lemma and α = z we obtain nα 2 = 1, thereby completing the proof of the lemma.
The inequality n z 2 ≤ 1 in Lemma 4.3 is equivalent with the inequality at the right in Proposition 4.1. The proof of the left inequality in Proposition 4.1 requires in general much more work, but not if 1 T v = 0. Then Lemma 4.3 yields n z 2 = 1. Since n > 1 this implies n 3 z 2 = n 2 · n z 2 = n 2 > 1. So, for the rest of proof we may assume 1 T v = 0.
In the sequel I will denote the subset of the index set {1, 2, . . . , n} defined by and J its complement. The restriction of v ∈ R n to the coordinates in I is denoted as v I . Using these notations we have the following lemma.

Lemma 4.4.
Let v be such that σ (v) ≥ 1 2 and 1 T v = 0. Then Proof: We already derived from the first two equalities in (12) that λ = 1 and, in (13), z = αλ. Since y ≥ 0 and λ − α1 ≥ 0, the third equality in (12) implies y i (λ i − α) = 0, for each i. Therefore, due to the definition of I, λ i = α for each i ∈ I. So we have where the inequality is due to λ ≥ α1. From y J = 0 we deduce z J = −βv J . Also using z = αλ we get z I = αλ I = α 2 1 I . Partitioning the vectors v, y, z and λ according to the partition (I, J), we get the following expressions: Now the primal feasibility conditions 1 T y = 1 and z T v = 0 yield two linear relations between α 2 and β, as follows: The determinant of the matrix of coefficients of this linear system of equations equals This expression is positive, because otherwise we would have 1 T I v I = 0 and v J = 0, which would imply 1 T v = 0, a contradiction with the hypothesis in the lemma. Hence the solution of the above system is unique. It is given by (17). This completes the proof.
Obviously, we can compute α and β from (17) if we know the 'optimal' partition (I, J) of the index set, since α is positive. After this the optimal solution immediately follows from (18). In order to proceed we need to know more about this partition. For that purpose the next lemma is important.
Proof: Taking the inner product with 1 at both sides of y = z + βv we may write (14), and α √ n < 1, by Lemma 4.3. This suffices for the proof of the lemma.
If we replace v by −v, and β by −β, then (y, z, β) and (λ, α) stay primal and dual optimal, respectively. We may therefore assume that 1 T v is nonnegative, without loss of generality. Since we only need to deal with the case where 1 T v = 0, we assume in the sequel that 1 T v > 0. Because of Lemma 4.5 we then have Yet we take into account the feasibility conditions y I > 0 and λ J ≥ α1 J . By (18) these conditions require α 2 1 I + βv I > 0 and − β α v J ≥ α1 J . Since α > 0, these inequalities can be reformulated as follows: Since β > 0, this makes clear that the entries of v in v I are strictly larger than those in v J . Moreover, the entries in v J are negative, and they are separated from the entries in v I by the (negative!) number −α 2 /β. Due to Lemma 4.4

this leads to the inequalities
At this stage it becomes natural to order the entries of v in nonincreasing order. Since v has entries of different signs, there must exist a (unique) index p such that, after the ordering of v, Moreover, since v J < 0, we have for some q ≥ p: Next we show not only that (21) determines q uniquely, and hence also I and J, but also that q can be found in O(n) time. For that purpose we define, for each k (1 ≤ k ≤ n), index sets I k := {1, . . . , k} and J k := {k + 1, . . . , n}, and numbers ω k , according to With this definition, (21) can be restated as Observe that the vector ω is nonnegative and ω n = 0. Moreover, the expressions k i=1 v i and n i=k+1 v 2 i can be computed from v in O(k) time. As a consequence, the vector ω can be computed in O(n) time. The following lemma determines the index q uniquely.

Lemma 4.6. q is the first index such that
Proof: For k < n one may easily verify that Because of (22) we have ω q + v q > 0 and ω q + v q+1 ≤ 0. Hence (25) implies ω q > ω q−1 and (24) implies ω q+1 ≤ ω q . Thus it becomes clear that ω is increasing at k = q − 1 and nonincreasing at k = q. Now suppose that ω k is nonincreasing at some index k ≥ p. Then ω k+1 ≤ ω k . Due to (25) we then have ω k+1 + v k+1 ≤ 0. Since v k+2 ≤ v k+1 it follows that ω k+1 + v k+2 ≤ 0, which means that ω k+2 ≤ ω k+1 , by (24). So, if ω k is nonincreasing at k ≥ p, then ω k remains nonincreasing if k increases. As we proved that ω k is nonincreasing at k = q, it follows that ω k is nonincreasing at all k ≥ q. If ω k were nonincreasing at some k < q, it would yield the contradiction that ω k is nonincreasing at k = q − 1. Hence ω k must be increasing (strictly!) at all k such that p ≤ k < q. Since ω k is nonincreasing at all k ≥ q, the lemma follows. In this case p = 5. For k ≥ p the largest value of ω k occurs at k = 8. So q = 8. In agreement with (24) and (25) the table demonstrates that ω k+1 + v k+1 and ω k + v k+1 have the same sign, for each k. Observe that q is the first index for which these expressions are negative.
Since |I| = q this is equivalent to which can be written as By Lemma 4.6 we have ω q ≥ ω p > 0. Since 1 T v + ≥ 1 T I v I > 0, the lemma follows.
Now we are ready to show that n 3 α 2 > 1, which will complete the proof of Proposition 4.1. According to Lemma 4.8 we have We therefore may write where we also used σ + (v) ≥ σ (v) and that (n − p)p 2 is maximal if p = 2n/3. Since σ (v) ≥ 1 2 and q ≤ n we thus obtain 1 The last expression is smaller than n 3 if and only if 11n 2 > 27 and this certainly holds, because n ≥ 2. Hence the proof of Proposition 4.1 is complete.

A bilinear saddle point problem
In this section we derive a simple saddle point problem that arises when searching for a cutting vector. In the next section we describe Nemirovski's mirror-prox method for solving that saddle point problem. According to Proposition 4.1, y is a cutting vector if it satisfies y ≥ 0, 1 T y = 1 and y L ≤ 1/n √ n, where L denotes the null space of A. Since y L = P A y, we therefore consider the (primal) problem min P A y : y ≥ 0, 1 T y = 1 .
This is a second-order cone problem again, just as the problem considered in the previous section. Its dual problem is given by Now let y * and u * denote optimal solutions of (26) and (27), respectively. Then the optimal value of α is given by α * = min(P A u * ).
Proof: Note that if (u, α) is feasible for (27) and α positive then P A u is a positive vector in the null space of A. So, in that case x = P A u is a solution of (1). On the other hand, if no such pair (u, α) with α > 0 exists, then the optimal value of (27) equals 0. Since the problems (26) and (27) have bounded feasible regions, they have optimal solutions and their optimal values are equal. So, problem (26) has optimal value 0 as well and this value is attained. Hence there exists a nonzero nonnegative vector y such that P A y = 0. The latter means that 0 = y ∈ L ⊥ and y ≥ 0. By Lemma 2.1 this implies that (2) has a solution.
Since (2) has a solution if and only if (1) has no solution, the lemma follows.
For a given u the best value of α in (27) equals the smallest entry in P A u. We therefore assume below that α always satisfies α = min(P A u). Then the feasible regions of (26) and (27) are, respectively, the unit simplex and the unit sphere B, as defined by = y ∈ R n : 1 T y = 1, y ≥ 0 , B = u ∈ R n : u ≤ 1 .
If y ∈ and u ∈ B then we may write Putting y = y * in (28) it follows that y * T P A u ≤ P A y * for each u ∈ B. Similarly, putting u = u * in (28) we get y T P A u * ≥ min(P A u * ) for each y ∈ . At optimality the primal and dual objective values are equal. So we have min(P A u * ) = P A y * = y * T P A u * . Thus we obtain This reveals that y * , u * is a saddle point for the bilinear function y T P A u. Thus we have reduced the solution of (26) and (27) to the computation of the saddle point of y T P A u.

Definition of the method
To simplify notation, from now on we denote P A simply as P. Following [8], we define the vector field F(z), where z = (y, u) ∈ × B, by The Mirror-Prox method can now be stated as follows. It is initialized with and uses the following update in each iteration: where γ is a fixed positive 'step size', with γ ∈ (0, 1]. Below we present the analysis of this method. Not surprisingly, due to the linearity in y and u the analysis goes easier than in [8], but the resulting iteration bound is the same.

Analysis of the method
proving (i). Takingz = z in (i) we get F(z) T z = 0 for each z. As a consequence we have proving (ii). Since P is an orthogonal projection matrix, we may write proving (iii).
To measure the distance between z andz we introduce the notation δ(z,z) := 1 2 z −z 2 . (33) Proof: By the Cauchy-Schwarz inequality and Lemma 6.1 we get The last expression does not exceed 1 2 u − v 2 + 1 2 u − w 2 . Hence, due to definition (33) the lemma follows.
For any three vectors u, v and w of the same dimension we have the trivial identity 2 . Evaluation of the right-hand side yields the so-called three-point relation of Bregman: Lemma 6.3. One has the following three inequalities: Proof: Since γ ≤ 1, (i) follows immediately from Lemma 6.2. The gradient with respect to Using (34) this gives Letting z = z k+1 , we get (ii). Finally, the gradient with respect to z of γ F(ẑ k ) which is equivalent to By applying Bregman's formula, we get (iii).
By adding the three inequalities in Lemma 6.3 we get the following lemma, without further proof (cf. [6, Theorem 18.2]).
For K = 1, 2, . . . we definez Soz K is simply the average of all iteratesẑ k with 1 ≤ k ≤ K. We denote the duality gap at any pair (y, u) with y ∈ and u ∈ B as gap(y, u) := P A y − min(P A u).
The next theorem is the main result in this section. It says that the duality gap atz K is inversely proportional to the iteration number K.
Theorem 6.5. By taking y 1 = 1/n and u 1 = 0 we obtain Proof: Recall that F(z) T z = 0 for every z ∈ × B. Using this we may write Hence we obtain from Lemma 6.4 that for each k = 1, 2, . . . , K, By taking the sum of these inequalities for k = 1, . . . , K, and then dividing by Kγ we get Since y 1 = 1/n and u 1 = 0 we have Thus we obtain Using that y T Pu is linear in both y and u we may write the expression at the left as follows: We take u = Pỹ K / Pỹ K and y = e i , where i is the entry for which Pũ K is minimal. Then the expression at the left takes the value gap(ỹ K ,ũ K ). Hence the inequality in the lemma follows.

The Mirror-Prox basic procedure
We now are ready to describe the new BP, called Mirror-Prox BP (abbr. MPBP). Each iteration of the MPBP requires the computation ofẑ k and z k+1 , as given by (30) and (31). We claim that this can be done in O(n 2 ) time. To understand this let z = (y; u), z k = (y k ; u k ) andẑ k = (ŷ k ;û k ). Since F(z k ) = (Pu k ; −Py k ) we may rewrite (30) as followŝ Obviously the computation ofẑ k can be separated into the computation of its componentŝ y k andû k , as follows:ŷ k = argmin y∈ γ y T Pu k + 1 2 y − y k 2 (37) In a similar way problem (31) can be split into the simpler problems So, given a pair (y k , u k ), the solution of these four problems yields the pair (y k+1 , u k+1 ).
Each of the four problems can be solved in O(n 2 ) time. Let us demonstrate this for problem (37), which computesŷ k . We first need to compute the objective vector Pu k . This already requires O(n 2 ) time. After this, the resulting minimization problem can be solved in only O(n) time. This is easy to understand for (38) and (40); for (37) and (39) it can be realized by using the approach used in [3]. The MPBP is presented in Algorithm 1. In this algorithmṽ =ỹ L ⊥ =ỹ − P Aỹ . As usual, we use an iteration counter k as subscript of all relevant vectors. The input is the matrix P A and vectors y ∈ and u ∈ B. The MPBP is initialized withỹ = y 1 = y andũ = u 1 = u. At the end of the k-th iterationỹ denotes the average of the iteratesŷ 1 toŷ k , and similar for u.
At the start of the MPBP we first check (in line 4) ifũ is primal feasible with positive α. If so, we put case = 1, which means that a feasible solution of problem (1) has been found. Otherwise, we compute σ (ṽ). If σ (ṽ) = 0, problem (1) is infeasible, by Lemma 3.3. This is expressed by putting case = 2. Otherwise σ (ṽ) > 0. If σ (ṽ) ≤ 0.5 then we have found a proper cutting vector, and we put case = 3. We then also determine all other proper cuts (in line 10).
In the remaining case we have σ (ṽ) > 0.5, and then Nemirovski's Mirror-Prox method is used to compute new values forŷ k ,û k , y k+1 and u k+1 . These values are the solutions of easy to solve minimization problems. Thenỹ is updated in such a way that it becomes the average of the iteratesŷ 1 ,ŷ 2 , . . . ,ŷ k , and similarly forũ. After this we enter the while loop again.
As we show in the next lemma, the BP stops after at most 2n √ n iterations, and then case ∈ {1, 2, 3}. In each of these three cases the BP returns the value of the variable case and the current values ofỹ,ũ to the MA, as well as the set J of indices for which proper cuts has been found and the values of the upper bounds for the related entries in x. Proof: After the k-th iteration we have, by Theorem 6.5, Since Pũ is not primal feasible, min(Pũ) ≤ 0. Hence we obtain According to Lemma 3.4 the vectorỹ is cutting if Pỹ < 1 n √ n . This certainly holds if Since γ = 1 2 , it follows that if k ≥ 2n √ n the MPBP will have stopped. Hence the lemma follows.

Theorem 7.2. Each execution of the MPBP needs at most O(n 3.5 ) arithmetic operations.
Proof: We established before that each iteration of the MPBP requires O(n 2 ) time. The number of iterations being at most 2n √ n, the theorem follows.
The above theorem makes clear that we have achieved an improvement over the original BP's in [4] and [12] which require O(n 4 ) arithmetic operations.

Modified Main Algorithm
In order to solve (1) one needs to call the MPBP several times by another algorithm, named the Modified main algorithm (MMA), which is described in Algorithm 2. It is a simplified version of the MMA in [12]. In [12] we used a property that is due to Khachiyan [7], namely that in the feasible case there exists a positive number κ such that positive entries in a basic feasible solution of (3) are larger than or equal to κ. The need for this number was a drawback, because it is not known in advance. In the new MMA we avoid the use of this number. We only use that (1) The new MMA is initialized with d = 1 and case = 3, and with y and u as the centres of of B, respectively. At the start of the while loop the projection matrix P A onto the null space of A is computed. Then the MPBP is called with P A , y and u as input. The MPBP returns to the MMA with the quadruple (y, u, J , case) as output. If case = 1 the BP has found a u such that x = DP A u is feasible for problem (1); if case = 2 it halts with x = 0, indicating that (1) has no positive solution. Finally, if the MPBP returns case = 3, it has found proper cuts, indexed by the set J . Then the entries in d J are divided by 2 and we rescale A, y and u as indicated in lines 11-14. After this we enter the while loop again. Then the MPBP will be called again, and so on.

Numerical comparisons
Like our MMA, the so-called Enhanced Projection and Rescaling Algorithm (EPRA) in [9] is designed to solve problem (1). Just as our MMA uses the MPBP as BP, the EPRA uses the Smooth Perceptron Scheme (SPS) as BP. As shown in [9], the SPS is by far more efficient than the three other candidates in [9]: the Perceptron Scheme, the Von Neumann Scheme and the Von Neumann Scheme with Away Steps. Therefore, following [9], we only considered SPS as a serious candidate for the BP in the EPRA.
Both the MPBP and the SPS establish feasibility of the primal or the dual problem, or they generate a 'proper' cut, where the meaning of 'proper' depends on a chosen threshold value. Below we denote this threshold value as τ . 3 Recall that in our MPBP (Algorithm 1) we took τ = 1 2 , but any other number in the interval (0, 1] would have been appropriate. A nice feature in [9] is that the authors found a way to construct random matrixes A with a prescribed value of δ A [9,Proposition 3.1]. This generator will be also used in our experiments.
As in [9] we start by a computational comparison of the two competing BP's in the next subsection, whereas in Subsection 9.2 we compare our MMA and the main algorithm EPRA in [9]. In both sections we use an appropriate variant of the Matlab code that was used for the experiments in [9] and that has been made publicly available at the website http://www.andrew.cmu.edu/user/jfp/epra.html.
By using this code it was easy to make the desired comparisons. We simply ignored the lines related to the three BPs in [9] that differ from the SPS and replaced them by the MPBP for the experiments in Section 9.1. A similar approach made it relatively easy to set up the comparison in Section 9.2.
One more remark should be made. Though the MPBP as presented in Algorithm 1 gave CPU times that were competing with those of the SPS, significantly better results were obtained by modifying lines 16-17. Recall that line 16 forcesỹ to be the average of the iteratesŷ 1 ,ŷ 2 . . . ,ŷ n . Instead of this we simply usedỹ = y k+1 . Similarly, in our implementation we replaced the command in line 17 byũ = u k+1 .

Comparison of MPBP and SPS
As in [9], for the comparison of MPBP and SPS we used six tables. Besides the parameters τ and δ A we use N to denote the number of instances per size, and as in [9], an iteration limit, which we denote as K. Per table these parameters are as given in Table 2. Except for the values of τ in Tables 3 and 4, the values are the same as in [9]. These two tables served in [9] to demonstrate the effect of the iteration limit K. In the case where K is finite the limit is active only in Tables 5 and 6. Therefore, we used in Tables 3 and 4 the same parameters  as in Tables 5 and 6, respectively, but without iteration limit.
In each of the six tables below we considered matrices A of four different sizes, as indicated in the first two columns of each table. For each size N different matrices A were generated.     After the computation of the projection matrix P A both the SPS and the MPBP were called, with input P A , the centre u of B and the centre y of . The tables show the averages of, respectively, the number of iterations and the CPU time in seconds. The last two columns show the 'rate of success', which is the ratio of successful runs for the specific problem size; a run is considered successful if the iteration limit K was not the reason for stopping the run. Thus we see that in Tables 3, 4, 7 and 8 all runs were successful. But in Table 5, line 2, 2 runs of the SPS and 1 run of the MPBP were not successfull and in Table 6 many runs failed to establish feasibility or infeasibility, or to generate a cut. Obviously, low values of the success rate are due the small value of τ in Tables 5 and 6. Note that in the extreme case that we would put τ = 0, any BP would stop its execution only after having solved the problem, i.e. only after having found a solution or having found proof for infeasibility of (1).
The six tables in this section permit us to conclude that in most cases MPBP is more efficient than SPS. In only two cases (in Table 8) SPS behaves slightly better than MPBP,  namely when δ A = 0.001 and m n. It is an interesting question whether this can be explained.

Comparison of MMA with EPRA, Gurobi and Mosek
In this section we compare our Main Algorithm (MMA) with the Enhanced Projection and Rescaling Algorithm (EPRA) in [9] and also with the commercial packages Gurobi and Mosek. For that purpose we used the publicly available Matlab code that was used to generate Table 7 in [9]; this table contains for eight different sizes the averages of the number of iterations of the EPRA, the number of iterations of SPS and the CPU time for N randomly generated matrices A. We modified this code in such a way that every matrix A was also sent to our MMA and to Gurobi and Mosek. For Gurobi and Mosek we used the parameter setting as given in Table 9. Tables 10 and 11 show the results.  In Table 10 the matrices A are generated by the matrix generator in the EPRA code. This implies that there exists a positive vector in the null space of A. In other words, problem (1) is always feasible in Table 10. Table 11 differs only in the way the matrices A are generated; we allow the resulting problem (1) to be infeasible, by simply using the Matlab command A = rand(m,n)-0.5. This difference is indicated by writing 'x > 0' in the caption of Table 10 and 'x free' in the caption of Table 11.
The first two columns in both tables contain the eight sizes that were considered; these are the same as in [9]. The next two columns give the average numbers of iterations of the EPRA and the MMA, respectively, whereas the subsequent two columns give the averages of the total number of iterations of the SPS and the MPBP, respectively. Then the next four columns give the average CPU times (in seconds) for both cases and also for Gurobi and for Mosek. Finally, the last column presents the percentage of the N instances that turned out feasible. By the construction of the matrices A, this fraction is always 1 in Table 10.
For Gurobi and Mosek we used the parameter setting as given in Table 9.
We finally discuss the values of the parameters τ , δ A and N. The matrix generator in [9] has the property that the matrix A is always such that δ A is given and positive. This implies that problem (1) is always feasible, as stated before. In practice, deciding on feasibility or infeasibility of a given problem is one of the main tasks of a solution method. In the infeasible case we have δ A = 0. In order to allow this important case in our experiments, we also included Table 11. By their construction, on average the matrices that were generated had an equal division of the signs of the entries. It may therefore be not surprising that if n ≈ 2m this leads-roughly spoken-to an equal division of feasible and infeasible instances, as we see in the lines 1, 2, 5 and 8 of the table. The other lines give rise to an interesting probabilistic question: should we have expected that each of the 1000 instances were feasible if n ≥ 5m and infeasible if n ≤ 1.25m?

Conclusions
The results in Section 9.1 reveal that the Mirror-Prox Basic Procedure (MPBP) that we introduced in this paper competes very well with the Smooth Perceptron Scheme (SPS) of [9]. The comparison in Section 9.2 confirms these results for the numerical behaviour of the Enhanced Projection and Rescaling Algorithm (EPRA), which uses the SPS as a subroutine, and the Modified Main Algorithm (MMA), which uses the MPBP as subroutine. The barrier codes for linear optimization in the commercial packages Gurobi and Mosek do very well in this comparison. It should be mentioned that this became true only after changing the default parameter setting as indicated in Tabel 9. In all considered cases Mosek beats Gurobi, but it is quite surprising that the straightforward implementations of EPRA and MMA compete very well with both. In six of the eight cases MMA is faster than Mosek.
Also a new challenge arises, because the implementation of the MMA that we used differs from the method that we analysed theoretically. Can one prove that the implemented method also requires only polynomial time?
Two other points are worth to be noticed. First, in the MMA, after the first time, the MPBP is initialized with the vectors y and u generated in the previous iteration of the MMA, hence not starting every time from y = 1/n and u = 0. Second, Table 11 suggests that when generating random matrices, the value of n/m strongly affects the feasibility of the problem.

Notes
He is (or was) member of the editorial board of several journals, among them the SIAM Journal on Optimization. He was secretary/treasurer of the SIAM Activity Group on Optimization. He supervised a large number of research projects, among them the Dutch nationwide NWO-project High Performance Methods for Mathematical Optimization, and the project Discrete Mathematics and Optimization of the (Dutch) Stieltjes Institute. He was also involved in a big project 'Optimal safety of the dikes in the Netherlands', financed by the Dutch government. that received the Franz Edelman award 2013 of INFORMS. He shared the 2011 Khachiyan prize of the INFORMS Optimization Society with Jean-Philippe Vial (University of Geneva).