Minimal divergence for border rank-2 tensor approximation

ABSTRACT A tensor is the sum of at least elementary tensors. In addition, a ‘border rank’ is defined: holds if r is the minimum integer such that is a limit of rank-r tensors. Usually, the set of rank-r tensors is not closed, i.e. tensors with may exist. It is easy to see that in such a case the representation of rank-r tensors contains diverging elementary tensors as approaches In a first part, we recall results about the uniform strength of the divergence in the case of general nonclosed tensor formats (restricted to finite dimensions). The second part discusses the r-term format for infinite-dimensional tensor spaces. It is shown that the general situation is very similar to the behaviour of finite-dimensional model spaces. The third part contains the main result: it is proved that in the case of the divergence strength is , i.e. if and the parameters of increase at least proportionally to


Introduction
The essential tool for the numerical treatment of tensors is appropriate sparse representations, i.e. the elements of the huge tensor spaces are represented by parameters or moderate size. There are several representations (also called 'formats') with different properties. One of the properties is whether they are closed or not. As explained below, nonclosed formats can lead to the unfavourable occurrence of an instability, known as cancellation effect from numerical differentiation.
We refer to Greub [1] concerning the formal definition of the algebraic tensor space involving vector spaces V j over the field K ∈ {R, C}. In the model case of V j = K n j , tensors in V can be considered as an array of components v[i 1 , . . . , i d ] ∈ K (1 ≤ i j ≤ n j ). Vectors v (j) ∈ K n j with entries v (j) [ with some (finite) r. Among all possible representations of v by (3) there is a minimal integer r, which is called the rank of v and denoted by rank(v). The inequality d ≥ 3 in (1) excludes the matrix case d = 2. This is important since matrices do not lead to the phenomena described below.
Often, the dimension d j=1 n j is far too large for an explicit treatment. Instead subsets F ⊂ V are used involving a smaller number of data. These subsets are called 'tensor formats' or 'tensor representations'. There are important tensor formats F which are not closed (the first observation of this fact is by Bini et al. [2]). To be concrete, we present two examples of nonclosed formats.
Another source of nonclosed representations are graph-based formats involving a graph not being a tree (therefore containing a cycle). Such representations are often used in physics (cf. [5]). In general, these formats are not closed (cf. Landsberg [6, Theorem 14.1.2.2]). The simplest example corresponds to a cycle and is called the 'cyclic matrix product representation'. Let V j = K n j and fix integers ρ j ≥ 2. Then matrices A A concrete example of a tensor not being in this format but in its closure, is given in [3,Theorem 12.11,p. 469]. For more examples see Czapliński et al. [7].
In Section 2, we give a survey of the results about the unstable behaviour of nonclosed formats. The first example from above exhibits divergence of the representing parameters like ε −1/2 . Possible questions are: Do other tensors exist with weaker divergence, can the divergence be arbitrarily weak, are there uniform estimates?
The analysis in Section 2 is restricted to the finite-dimensional case, since otherwise the compactness arguments do not apply. The infinite-dimensional case dim V j = ∞ of the r-term format R r in a general Banach tensor space will be considered in Section 3. It will turn out that the behaviour is almost equal to the finite-dimensional one. This proves that we can restrict the study to the model spaces V = d j=1 K n j . In the case of the format R 2 , the relevant model space is d j=1 K 2 . In Section 4 we prove that the divergence strength ε −1/2 is the minimal one, i.e. the approximation of all tensors of border rank 2 but rank > 2 requires parameters diverging at least like cε −1/2 . Concerning the possible dependence of the constant c on the tensor we refer to Section 2.

General divergence behaviour of nonclosed formats
The following statements are described in detail and proved in [8] and [3, § § 9.5.3-9.5.6, p. 312ff]. We consider a format satisfying the following simple conditions. It is described by a continuous mapping with 0 ∈ D and ρ(0) = 0, where P is a normed vector space and D a closed subset (usually D = P). 1 The format is defined by A two-sided cone condition is required: v ∈ F implies λv ∈ F for all λ ∈ K. In order to apply compactness arguments for bounded sets, we require dim(P) < ∞ and dim V < ∞. The norms on P and V are denoted by · . Finally, F is assumed to be nonclosed, i.e. there is a disjoint set B of 'border tensors' such that Since, in general, the representation of v ∈ F is not unique, we introduce the minimal 2 bound of the corresponding parameter by A natural task is to approximate some w ∈ B by tensors v ∈ F with w − v < ε. The smallest parameter size is given by If w ∈ B, weakly monotone divergence δ(w, ε) ∞ holds as ε 0. The proof is based on the following lemma (cf. [4,Proposition 4.8]) The proof uses a convergent subsequence p i → p * so that w = lim ρ(p i ) = ρ(p * ) proves w ∈ F. Correspondingly, δ(w, ε) ∞ follows by an indirect proof. It ensures the existence of the diverging quantity δ(w, ε), but does not describe the strength of divergence quantitatively. In particular, the divergence might be different for different w ∈ B.
Aiming at a uniform statement, the strongest formulation would be Replacing c in (9) by a factor dist(w, ∂B), we obtain the following generally valid statement.

Theorem 2.2:
There is a function This inequality ensures the existence of a minimal divergence strength δ 1 , but the indirect proof does not describe the concrete nature of δ 1 .
Since ∂B ⊂ F, closedness of ∂B implies dist(w, ∂B) > 0. In fact, for F = R 2 the set ∂B is closed.

Infinite dimensions
Now the vector spaces V j in (1) may be infinite dimensional. We restrict our considerations to F = R r . Note that the cyclic matrix product format cannot be extended to dim V j = ∞, since it would require infinitely many matrices.
We recall that v ∈ R r corresponds to a representation i . We may define the mapping ρ in (5) by ρ : p = (e i ) i=1,...,r → v = r i=1 e i with elementary tensors e i (cf. (2); this mapping is mentioned in Endnote 1). This is a natural choice since ρ is linear. The norm of p is chosen as The next statements mention another format: the 'tensor subspace representation' T r (also called Tucker representation). Let r := (r 1 , . . . , r d ) a d-tuple of integers. Then T r consists of all tensors of the form v ∈ d j=1 U j with subspaces U j subject to dim U j ≤ r j (cf.
Different from the finite-dimensional case, the norm of the topological tensor space plays an important role. We require two properties: (a) The tensor product must be a continuous map from where · j is the norm on V j , while · is the norm on V. An equivalent statement is · · ∧ , where · ∧ is the projective crossnorm (cf. [ where · * and · * j are the related dual norms. An equivalent statement is · · ∨ , where · ∨ is the injective crossnorm. Note that · ∨ is the weakest possible reasonable crossnorm (cf. [3, § 4 Under these conditions the representation T r is closed. 3 This property leads to an estimate of the rank by means of the border rank (denoted by rank).

Remark 3.2:
A tensor with rank(v) = r belongs to T r with r := (r, . . . , r). In particular, we have 4 Let w be the limit of {v i }. There are unique minimal subspaces U min . We introduce the model space K r j equipped with the Euclidean norm · mod . Note that Section 2 applies to U mod and the format F = R r . In the following, we shall show that the divergence behaviour δ(w, ε) is equal to the divergence behaviour in the model space U mod up to constants for which explicit bounds can be given.
Next we need the following result whose proof is postponed to the end of this section.

Lemma 3.3:
There are projections 5 U(w) and U mod are isomorphic since dim(U min j (w)) = dim(K r j ). Hence, the norms (restriction of · to U(w) and · mod ) are equivalent. However, the constants in the equivalence inequalities are still undetermined. The Proposition 3.4: δ U (w, ε) and δ(w, ε) satisfy the following inequalities: Hence,û appears in the right-hand side of (14) for ε replaced by ε P and leads to the parameter size σ (û) ≤ P σ (v) (cf. (10)). This proves σ (v) ≥ σ (û)/ P ≥ δ U (w, ε P )/ P . Forming the infimum over all v ∈ F, w − v < ε, we obtain the second inequality.
i are the standard unit vectors of K r j . Let F and B correspond to U(w) ⊂ V, while F mod and B mod correspond to the model space U mod .

Remark 3.6:
maps F into F mod and B into B mod . The following estimates hold: Proof: (a) : F → F mod and : B → B mod follow from the fact that isomorphisms do not change the rank and border rank.
proves the second inequality.
(d) The estimates of φ j and φ −1 j follow by the same lines. Let The next result shows that the divergence behaviour of w and x is equal up the controlled constants. The proof follows by the same lines as for Proposition 3.4.

Proposition 3.7:
The following estimates hold with defined in Remark 3.6: Concerning the inequality in Theorem 2.2, we remark that maps ∂B onto ∂B mod and
Now we give the proof of Lemma 3.3.

Proof: Let {b
i . P j ≤ r j is immediate. Obviously, the image Px = d j=1 P j x (j) of an elementary tensor x = d j=1 x (j) is again elementary. Therefore, Px with x ∈ R r consists at most of r elementary tensors.
Summation over all terms yields Finally, we add two remarks illustrating the 'finite-dimensional nature' of algebraic tensors. The first property is also mentioned by Fernández and Unzueta [9]. We recall that the topological tensor space is the completion of the algebraic tensor space V alg := d j=1 V j with respect to some norm.

Remark 3.9:
The closure of R r ⊂ V is independent of the norm of V.
Proof: Let V I and V I be two topological tensor spaces with respect to two different norms · I and · I . Let w ∈ V I be the limit of a sequence v i ∈ R r ⊂ V alg ⊂ V I . As seen above, the projected sequencev i = Pv i satisfiesv i ∈ U(w) andv i → w with respect to the norm · I . However, the restrictions of the norms · I and · I to the finite-dimensional subspace U(w) ⊂ V I ∩ V I are equivalent so thatv i → w also holds with respect to · I .
For the solution of optimization problems, it is desired that the infimum of a cost function F(v), v ∈ F, is also a minimum. Assuming a reflexive Banach space V, the minimum is taken by a weak limit w of some sequence v i ∈ F. If F is weakly closed (cf. Endnote 3), the minimizer w belongs to F. For infinite-dimensional spaces, weak convergence and strong convergence (standard convergence) must be distinguished. In finite dimensions both kinds of convergence coincide. Remark 3.10: If w is a weak limit of v i ∈ R r , it is also a strong limit of (possibly other) v i ∈ R r . Hence, the weak closure and the standard closure of R r coincide.

Proof:
Assume v i ∈ V and v i w with rank(w) < ∞. Note that w ∈ U(w) (cf. (13)). Let P be a bounded projection of V onto U(w) (cf. Lemma 3.3). Then alsov i := Pv i ∈ U(w) satisfiesv i w. However, inside of the finite-dimensional subspace U(w), weak convergence implies strong convergence, i.e.v i → w. For the proof ofv i w let ϕ ∈ V * be a functional.

Result
As seen in Section 3, it suffices to study the behaviour of the model spaces d j=1 K r j . In the case of F = R 2 the model space is endowed with the Euclidean norm. The set B from (6) consists of all tensors w ∈ V with rank(w) = 2 but rank(w) > 2. The example of the tensor in (4) and its approximation by centred divided differences shows that δ(w, ε) ≤ c ε −1/2 (notation: δ(w, ε) ε −1/2 ). Below we prove the opposite inequality: δ(w, ε) ε −1/2 . This proves The same equivalence holds for δ 1 (ε) in Theorem 2.2.
For the proof of δ(w, ε) ε −1/2 we show that a limit w = lim v i of a sequence with weaker divergence must belong to F. The following result can be considered as a stronger form of Lemma 2.1.

Choice of subsequences
Let v μ ∈ R 2 in (16) be represented in the form using the parameters p μ = (p 1,μ , i,μ . According to Remark 3.1, we use an equal scaling of the factors: If the sequence of the Euclidean norms p μ (cf. (10)) is bounded, the assertion w ∈ R 2 follows from Lemma 2.1. Hence, we may assume that there is a subsequence 6 with p μ → ∞. One of the quantities p 1,μ and p 2,μ must be unbounded. W.l.o.g. let lim sup p 1,μ = ∞. Passing to a subsequence, we get The scaled quantities x (j) . (17)). We choose a subsequence such that x After these preparations we obtain the representation Equations (17) and (18) There are orthogonal transformations Q j : The statement of the theorem does not change when we apply the product Q := d j=1 Q j and replace v μ by Qv μ (norms are unchanged, w ∈ R 2 if and only if Qw ∈ R 2 ). In the following, we assume x (j) 1,μ and x (j) 2,μ are perturbations of 1 0 tending to 1 0 . We denote the terms of The above representation of v μ corresponds to the parameters p μ with p μ = 2δ μ (1 + o(1)). Therefore the asymptotic behaviour In the following, we omit the index μ to simplify the notation.
In the first step, we restrict the proof to the real field K = R. The complex case will be discussed in Section 4.10.

Notations
The tensor v ∈ V has 2 d components v[i 1

Distinction of cases
The asymptotic behaviour of v is known, but we need to know how the involved quantities β j , β j behave. In the following, the index j ∈ {1, . . . , d} is fixed. If β j = β j = 0 for almost all members of the sequence, the limit w must be a multiple of d j=1 Otherwise, we restrict to a subsequence with |β j | + |β j | > 0. Consider the quotient q j := |β j /β j | (set q j = ∞ if β j = 0). If {q j } has an accumulation point 0, we can extract a subsequence with q j → 0, leading to the first statement in case (21): Otherwise, if {q j } has the improper accumulation point ∞, the first statement in case (22) applies to a suitable subsequence:: In the remaining case, there exists an accumulation point q ∈ (0, ∞). Choose a subsequence with q j → q. The subsequence can be selected in such a way that all members β j of the sequence have the same sign and all β j have the same sign. In this case, β j ∼ β j holds and we can distinguish two subcases: β j β j ≥ 0 is added to (21) and (22), whereas β j β j < 0 defines the last case (23): Hence, for each j ∈ {1, . . . , d}, one of the conditions (21), (22), (23) applies. Conditions (21) and (22) may apply simultaneously.

Remark 4.2:
The idea of the following proof is to show that the tensor w from (16) has at most three non − zero components w , w i , w j (i = j).
W.l.o.g. we may assume i = 1, j = 2. Then the tensor is of the form This proves rank(w) ≤ 2 and yields w ∈ F = R 2 .

Case A: condition (21)
The component v j is of the form (cf. (20)). Under assumption (21), δβ j is the leading term, i.e. v j ∼ δβ j .

Remark 4.3:
Assume that (21) holds for j in (25). Then there are two cases (a) and (b).

Lemma 4.6:
It is impossible that (21) and w j = 0 hold for two different indices j 1 and j 2 , while (22) and w j 3 = 0 hold for a third index j 3 (j 1 = j 3 = j 2 ).

Case D: condition (23)
Now we assume that condition (23) holds for all j ∈ {1, . . . , d}. Because of β j ∼ β j and their opposite signs, the terms in (25) may cancel. It is not possible to estimate δβ j or δβ j by means of v j . For this purpose, we introduce the notation with a bounded α j . Note that α j → 0 may occur. Inserting (30) into (25) we obtain Proof: (a) Since β j v → 0, the term (1 + η )δβ j α j → w j in (31) must be bounded.
Assuming condition (23) for all j ∈ {1, . . . , d}, we conclude from Lemma 4.9 that at most two components w j may be non-zero, while all higher-order terms vanish because of Remark 4.7(b). Hence Remark 4.2 proves w ∈ R 2 .
Next we discuss the mixed situation when indices with condition (21) or (22) as well as indices with condition (23) are present.

Case E: mixed situation
(note that both terms have the same sign, no cancellation!). δβ j ∼ 1 from (26) implies Let condition (21) or (22) hold for j ∈ D 1 = ∅, while (23) holds for j ∈ D 2 := {1, . . . , d}\D 1 = ∅. For all k ∈ D 2 we have w k = 0 as stated in Lemma 4.10. Among D 1 there can be at most two indices with w j = 0 (cf. Lemmata 4.5 and 4.6). Remark 4.7(b) states that w j,k = w j,k,... = 0 for higher-order components with j, k ∈ D 2 . Otherwise, one of the indices must belong to D 1 . Then w j,k = w j,k,... = 0 follows from Conclusion 4.4. This proves the following result, so that again Remark 4.2 can be applied.

Remark 4.11:
Also in the mixed case, w j = 0 occurs for at most two indices j, while all higher-order components vanish: w j,k = w j,k,... = 0.

The complex case
Now we discuss the modifications for the complex tensor space ⊗ d C 2 .
The condition β j β j ≥ 0 in (21) and (22) has to be replaced by i.e. we have to avoid that in the limit the complex signs of β j and β j are opposite. Then the same conclusions follow as in the real case. The inequality β j β j < 0 in (23) becomes Again we set β j = −β j (1 + α j ) with bounded α j . For the proof of Lemma 4.9 we used the fact that at least two of the three quantities α 1 , α 2 , α 3 have the same sign, so that, e.g. |α 1 + α 2 | = |α 1 | + |α 2 |. Now we use that among three complex values α 1 , α 2 , α 3 there are at least two -say α 1 and α 2 -with |α 1 + α 2 | ≤ 1 2 (|α 1 | + |α 2 |). This leads to the same results.
With these modifications the previous proof can be repeated for the complex case.

Notes
1. In the case of F = R r , we may choose D = P as the product space (V 1 × · · · × V d ) r containing the vectors v (j) ν from (3). Alternatively, D may be the subset of P = V containing all r-tuples of elementary tensors. Note that the set of elementary tensors is closed. 2. The right-hand side in (7) can be replaced by min{. . .}. 3. It is even 'weakly closed', i.e. if v i ∈ T r has a weak limit, this limit belongs to T r . 4. The estimate can be improved using better bounds of the maximal rank (cf. [3, § 3.2.6.5, p. 71f]). 5. The estimates can be improved if · is a uniform crossnorm (cf. [3, § 4.2.8, p. 133]): P j ≤ √ r j and P ≤ √ dim(U(w)) (cf. [3,Theorem 4.16, p. 103]). 6. A subsequence of {p μ : μ ∈ N} can be understood as {p μ : μ ∈ N } with some subset N ⊂ N of infinite cardinality. This allows us to use the unchanged notation p μ . Choosing a further subsequence, we replace N by another infinite subset N ⊂ N . 7. ε := v − w in (19) implies |v ... − w ... | ≤ ε for all components. 8. Convergence modulo 2π.

Disclosure statement
No potential conflict of interest was reported by the author(s).