Testable uniqueness conditions for empirical assessment of undersampling levels in total variation-regularized x-ray CT

We study recoverability in fan-beam computed tomography (CT) with sparsity and total variation priors: how many underdetermined linear measurements suffice for recovering images of given sparsity? Results from compressed sensing (CS) establish such conditions for, e.g., random measurements, but not for CT. Recoverability is typically tested by checking whether a computed solution recovers the original. This approach cannot guarantee solution uniqueness and the recoverability decision therefore depends on the optimization algorithm. We propose new computational methods to test recoverability by verifying solution uniqueness conditions. Using both reconstruction and uniqueness testing we empirically study the number of CT measurements sufficient for recovery on new classes of sparse test images. We demonstrate an average-case relation between sparsity and sufficient sampling and observe a sharp phase transition as known from CS, but never established for CT. In addition to assessing recoverability more reliably, we show that uniqueness tests are often the faster option.


Introduction
Regularization methods for tomographic reconstruction that exploit sparsity have been in the focus of research recently. Motivated by the theory of compressed sensing (CS) [1,2] many papers proposed to use sparse or total variation (TV) regularization to compute tomographic reconstructions from underdetermined measurements. [3][4][5] One promising goal is unchanged or even improved reconstruction quality from a significantly reduced sampling effort, thereby lowering the necessary radiation dose in medical computed tomography (CT) and scanning time in e.g. materials science and non-destructive testing.
Compressed sensing offers methodologies to predict under what circumstances it is possible to compute exact reconstructions from underdetermined linear measurements. Usually, these conditions depend both on the measurement matrix and on the signal class that is considered. Roughly spoken, a standard result reads as follows: all vectors that are sparse enough can be reconstructed exactly from underdetermined measurements with a random matrix (e.g. all entries independently, identically Gaussian distributed) by computing the solution of the linear system that has the smallest 1 -norm. [6] There are also results [7] that state exact recovery conditions for TV regularization. [8] An overview of CS recovery guarantees can be found in [9]. It is generally acknowledged, however, that existing guarantees either do not apply to or give extremely pessimistic bounds in deterministic sampling contexts. [10] In particular for CT, [11][12][13] describe the lack of general guarantees, while [11,12] derive preliminary average-case results for certain restricted special geometries known as discrete geometry; however, these results do not cover regular sampling patterns in CT, such as parallel-beam and fan-beam geometries.
In our recent work [13,14], we have been interested in establishing conditions on sparsity and sampling levels sufficient for image recovery with regular CT sampling patterns. In particular, [14] suggests a link between gradient sparsity and sufficient sampling for accurate TV-reconstruction. In [13], we carried out empirical studies of the average sufficient number of CT fan-beam projection views for 1 -recovery as function of image sparsity. Using a phase diagram similar to the Donoho-Tanner [15] phase diagram, we showed that 1 -recovery often admits sharp phase transitions as the sampling level is increased, and that the critical sampling level increases with the number of image non-zeros.
The present work considerably expands on the results of [13] by addressing two limitations. First, while 1 -regularization is useful for CT, TV-regularization is often a more successful sparsity prior for CT, because many objects have a piecewise constant appearance. The present work extends to study recovery using anisotropic and isotropic TVregularization and as a step towards this proposes ways to generate images of a desired gradient sparsity. Second, both 1 and TV reconstruction is possibly subject to non-unique solutions. The approach of [13] considers an image to be uniquely recoverable if numerical solution of the relevant optimization problem recovers the original image, but does not consider whether the computed solution is unique. In the present work, we derive uniqueness tests that can be used to computationally verify 1 and TV solution uniqueness. We then compute phase diagrams using both reconstruction and uniqueness tests to verify the 1 recovery results from [13] and for anisotropic and isotropic versions of TV. In all cases, we observe a pronounced average-case relation between sparsity and the sufficient sampling level for recovery as well as a sharp phase transition. We also compare the reconstruction and uniqueness test approaches in terms of computing time.
The paper is organized as follows. Section 2 describes the CT imaging model and 1 -norm and TV-regularization problems in study. Section 3 describes necessary and sufficient conditions for solution uniqueness, while Section 4 presents our numerical implementations of reconstruction and uniqueness tests. Section 5 describes how to generate test images with desired image or gradient sparsity. Section 6 presents our results establishing empirically a relation between sparsity and the average sufficient sampling level for recovery. Finally, Section 7 discusses the results and concludes the paper.

Sparse image reconstruction methods for computed tomography 2.1. Imaging model
Imaging by CT exploits that X-rays are attenuated when passing through matter. The attenuation depends on the material traversed by the X-ray, as described by the so-called linear attenuation coefficient μ, with denser materials generally attenuating more. The intensities I 0 and I of an X-ray before and after passing through an object with linear attenuation coefficient μ(s), as function of the spatial coordinate s, can be modelled by Lambert-Beer's law, see, e.g. [16], which in a rearranged form reads where L μ(s)ds denotes the line integral along the line L describing the X-ray path. By means of discretizing the object into n pixels and the data by assuming that m individual X-rays with infinitesimal width are used, a fully discretized imaging model can be written b = Ax. Here, x is a vector of length n of all pixel values stacked. A is an m-by-n matrix of which the (i, j)th element, A i j , equals the path length of the ith ray through pixel j, such that j A i j x j approximates the line integral in (1) for ray i. Each ray only intersects a small number of pixels (for a square √ n-by-√ n array of the order of √ n), causing the remaining A i j values to be zero, see Figure 1 (right). b is a vector of length m with the log-transformed data, i.e. b i = − log(I i /I 0 ) for rays i = 1, . . . , m.
In the present work, we consider two-dimensional fan-beam geometry with equiangular projection views acquired from 360 • around the image, see Figure 1 (left). Due to rotational symmetry, we consider the image to be the largest inscribed disc within a square N side -by-N side pixel array, hence consisting of n ≈ π/4 · N 2 side pixels. The source-to-rotation-centre distance is set to 2N side and the detector has the shape of a circular arc centred at the source and consists of 2N side detector elements. The number of projection views is denoted N v and the fan angle is set to 28.07 • so that precisely the inscribed disc is covered. The total number of linear measurements is m = N v · 2N side and the m-by-n system matrix A is computed using the function fanbeamtomo in the MATLAB package AIR Tools. [17] Note that the matrix A is both structured and sparse and hence fundamentally different from matrices typically considered in CS, e.g. fully dense matrices with independent  Figure 1. Example of CT scanning geometry (left) and corresponding system matrix A (right). The square side length is N side = 8 pixels causing n = 52 pixels in the disc-shaped domain. A total of N v = 3 equiangular projection views each of 16 rays are shown, i.e. in total m = 48 rays. The system matrix is sparse, zero-valued entries are shown light gray (right) and increasing path lengths are shown increasingly dark. White columns are to be masked out due to corresponding pixels being outside the disc-shaped domain (12 white pixels, left).
identically distributed Gaussian entries. We can determine the approximate fraction of nonzeros in the m × n CT system matrix A to O(1/N side ), as each of the m rays intersect O(N side ) pixels. In this paper, we only discuss undersampling by taking few projection views, i.e. small N v , while the number of rays per projection remains constant, since this is the most straightforward way to undersample in practice using existing CT-scanners. However, other undersampling strategies could be treated analogously.
Taking only few CT measurements amounts to having an underdetermined linear system Ax = b. The non-trivial nullspace of A means that a given data vector b does not specify a unique solution, which is clearly unacceptable. In the very undersampled scenario, standard CT reconstruction methods such as filtered back-projection and the algebraic reconstruction technique produce highly noisy and artifact-containing reconstructions. [18] This motivates the use of regularization, which is the topic of the following section.

Sparse regularization
In the context of an underdetermined system b = Ax, A ∈ R m×n , m < n, one obtains a whole affine space of solutions. One selects one solution of this space by considering a regularization functional R, and computing the minimum-R solution, i.e. the solution of In the present work, we study 1 -norm (L1), anisotropic TV (ATV), and isotropic TV (ITV) regularization. First, the 1 -norm, defined as L1 : enforces sparsity of the minimizer. 1 -norm minimization forms one backbone of compressed sensing and the problem of computing minimum-1 -norm solution of underdetermined systems is also known as Basis Pursuit. [19] Second, anisotropic TV can be written using a set of vectors d i , i = 1, . . . , N (of length n as x) and the matrix D = [d 1 , . . . , d N ] as ATV : The matrix D is called dictionary in this context and the minimum-R solution is seeking a solution in which inner products with the dictionaries entries form a sparse vector, for example, to enforce sparsity in the coefficients of a wavelet basis or a dictionary learned from training images. We use ATV to denote the general case but focus in the present work on the anisotropic TV, which is a special case where D T contains finite-difference approximations of the horizontal and vertical derivatives in each pixel, i.e. N = 2n. However, we emphasize that all our theoretical results hold in the general case.
In our experiments in Section 6, we use simple forward differences, i.e. assuming column-wise ordering of pixels and a N side × N side domain, d T i x equals x i+1 − x i and x i+N side − x i for vertical and horizontal derivatives, respectively, for non-boundary pixels. We set Neumann boundary conditions by replacing differences across the boundary by zero. The combined matrix D T is adapted to the disc-shaped domain by straightforward masking.
Third, isotropic TV can be written as a special case of the group sparsity problem. [20] This problem differs in a small but crucial point from ATV: Here, we do not simply take the 1 -norm of D T x as objective function, but we take a mixed 1,2 -norm. To fix notation, consider a linear mapping D : R r × p → R n , i.e. the transposed map is D T : R n → R r × p , where r is the number of groups and p is the number of pixels in each group. For Y ∈ R r × p we consider the mixed 1,2 -norm We use ITV to denote the general case but focus in the present work on the isotropic TV, for which we take each group to be the horizontal and vertical finite-difference partialderivative approximations in each pixel, i.e. p = 2 and r = n. Again, our theoretical results hold in the general case. Note that in the TV case, i.e. finite differences, the matrix D in ATV and linear mapping D : R n×2 → R in ITV are closely related, namely where (·) i denotes the ith column of the argument.

Necessary and sufficient conditions for uniqueness of minimizer
In this section, conditions for the uniqueness of the considered optimization problems are introduced. We use the following notation. The complement of an index set I ⊂ {1, . . . , n} is denoted as I c = {1, . . . , n}\I . If A ∈ R m×n , m < n, then A I denotes the submatrix of A with columns indexed by I . The transposed of such a submatrix is denoted by A T I . The range of a matrix is denoted rg(A) = {Ax : x ∈ R n }. The signum function is denoted sign(·) and defined to −1, 0 and 1 for negative, zero and positive arguments and its application to vectors is done component-wise.

Uniqueness conditions for L1
The following theorem gives necessary and sufficient conditions for a vector x * to solve L1 uniquely.
if and only if ker and there exists w ∈ R m such that Proof See [21].
The theorem has two important consequences: first, the recoverability of a vector x * only depends on its sign pattern, not the magnitude of its entries. Second, to know if some vector x * can be recovered from the measurement Ax * (for some given, fixed A), one only needs to check the existence of a vector w ∈ R m such that the equality and the inequality from (7) are fulfilled. As we will show in Section 5, this can be done by solving an mdimensional linear program. Since the vector w is related to the dual optimization problem of (5), it will be called dual certificate in the following.
A seemingly different necessary and sufficient condition for x * being the unique L1 solution given in [22] is the existence of a vector w such thatA T . This condition may appear weaker but, in fact, is shown in [22] to be equivalent to the conditions in Theorem 3.1 and we do not use it any further.

Uniqueness conditions for ATV
We extend the previous result to the ATV case through a straightforward generalization of Theorem 3.1. The following theorem basically adapts the result in [23] and follows the proof of the main result in [22]. Similar conditions are given in [24]. and there exists w ∈ R m and v ∈ R N such that Proof The proof is separated into two parts, each for one direction. First, it will be shown that x * is the unique solution under the given conditions. For each y ∈ R n with y = x * and (9) is provided. Further, with s = sign D T x * , the remaining conditions imply As a result of (9), the latter inequality is truly strict, see above. This proves the conditions to be sufficient for ATV uniqueness. Let us assume the vector x * solves the considered optimization problem uniquely. Since the optimization problem (8) is piecewise linear, for all h ∈ ker(A)\{0} it holds that and, since the latter inequality holds for all non-trivial null space elements, in particular −h, it holds that Moreover, it holds D T I c h = 0 which implies (9). Finally, the rest of the conditions will be proved. Choose η ∈ R N with η i = sign d T i x * for i ∈ I and η j = 0 for j / ∈ I and assume D T η is not an element of the range of A Totherwise the proof would be finished. Let ker A be p-dimensional and {w (l) } 1≤l≤ p be a basis of ker A with 1 = η T D T w (l) ; it holds that In the following, a vectorξ ∈ R N will be constructed such thatξ T D T w (l) = −1 holds for all 1 ≤ l ≤ p; hence, the vector D(η +ξ ) is an element of the range of A T due to its orthogonality to the null space of A. Consider a solutionξ ∈ R N of the problem and q * ∈ R p as a solution of its dual problem (10). Hence, the element v =ξ + η satisfies v I = sign(D T I x * ) and v I c ∞ < 1 and Dv is an element of the range of A T . This proves the remaining conditions as necessary.
We also call the pair (v, w) of vectors a dual certificate for ATV. Note that adding a kernel element of D T does not affect the property of being uniquely recoverable by ATV: In the setting of Theorem 3.2, let h ∈ ker D T and let x * be the unique solution of (8). Thenx = x * + h is also a unique solution of (8), with x * replaced byx.
Proof Just note that supp D T x * = supp(D Tx ) and that if (v, w) is a dual certificate for x * it is also a dual certificate forx.

Uniqueness conditions for ITV
The following theorem gives sufficient conditions on A, D and x * such that x * is the unique ITV solution; a similar result is given in [25,26] but we include this result with full proof for the sake of completeness.
Theorem 3.4 Let A ∈ R m×n , m < n, D : R r × p → R n and x * ∈ R n and denote by Then it holds for the Y defined above that Also note that obviously, y ∈ S. Now consider a vector v ∈ R n that is feasible for (11), i.e. it holds that Av = Ax * . If v would be an element of S, then, by assumption In other words, every feasible v different from x * has a larger objective value.
Note that the conditions in Theorem 4 are basically similar to the ones in Theorem 3, when interpreted correctly: the second condition in Theorem 4 can be written as ker A ∩ ker(D * I c ) (with the correct interpretation of the set I c ) and the second resembles the condition that the vector w from Theorem 3 has to be equal to the signum of D T x * on I and smaller than one otherwise if one interprets the fraction ( as the signum of the vector (D T x * ) i . However, for p > 1, the conditions in Theorem 3.4 are in general not necessary 1 .
Whereas geometrical interpretations of the conditions in Theorem 3.2 for ATV and Theorem 3.4 for ITV are difficult to establish and may require a high level of comprehension, the L1 conditions in Theorem 3.1 can be explained easily. For, say, a full rank matrix A ∈ R m×n with m < n and x * ∈ R n with I = supp(x * ) and |I | = k ≤ m, the conditions (7) can be seen as the intersection of the affine space rg(A T ) with the n-dimensional hypercube [−1, +1] n . Indeed, the affine space cuts the interior of an (n − k)-dimensional face of the hypercube; which face is sliced depends on the sign pattern of x * .

Numerical implementation of reconstruction and uniqueness tests 4.1. Reconstruction problems
The three regularized reconstruction problems, L1, ATV and ITV are solved numerically by a primal-dual interior-point method using MOSEK [27]. Our motivation for this choice of method is to ensure that the optimization problem is solved accurately. MOSEK achieves this by producing a certificate of optimality for the returned numerical solution, i.e. a primaldual solution pair with duality gap numerically close to zero.
Across the large number of optimizations done for the present study, it is our experience that other methods such as accelerated first-order methods [28,29] and primal-dual methods [30,31] are less reliable in actually arriving at a solution of the equality-constrained problem accurate enough to reliably assess whether it is equal to the original image.
We solve L1 as a linear program (LP) by introducing q ∈ R n to bound x: In a similar fashion, ATV can be solved as an LP. By defining z = D T x and using q for bounding z we can solve the problem as min x,z,q 1 T q subject to Ax = b and z = D T x and − q ≤ z ≤ q.
ITV can be recast as a conic optimization problem, which can also be solved by MOSEK. Again, we introduce the bounding vector q ∈ R n and can solve the problem as in which each of the n inequalities specify a quadratic conic constraint.

Uniqueness tests
As stated by Theorem 3.1, we can show that x * is the unique L1 minimizer if and only if A I is injective and there exists a w ∈ R m such that A T I w = sign(x * ) I and A T I c w ∞ < 1.
Injectivity is tested by evaluating whether A I has full column rank. The second condition can be tested by minimizing A T I c w ∞ with respect to w while respecting the equality constraint A T I w = sign(x * ) I . By splitting the infinity-norm into a two-sided inequality constraint, this problem can be solved as an LP, which we accomplish by use of MOSEK's primal-dual interior-point method. For the optimal solution (t , w ), by definition, we have the smallest possible t = A T I c w ∞ . If t is not smaller than one, then no w exists with smaller t. We therefore declare x * the unique minimizer if t < 1, and if t ≥ 1, x * cannot be the unique minimizer. Numerically, we test whether t < 1 − for = 10 −5 to ensure that the inequality is satisfied strictly. Technically, by doing this, we risk falsely rejecting solution uniqueness of some x * , namely the ones for which 1 − < t < 1. However, we found the decision on uniqueness to be robust to other choices of , so we believe this is not a problem in practice.
Regarding ATV, we can show solution uniqueness of a vector x * by use of Theorem 3.2. The condition of zero-intersection of ker(A) and ker(D T I c ) can be checked numerically by evaluating whether the matrix (A; D T I c ) has full rank, where semicolon means vertical concatenation. Similar to the L1 case, the second condition can be tested by solving in MOSEK the LP, and assessing whether the optimal t is smaller than 1 − . For isotropic TV we can show solution uniqueness by use of Theorem 3.4. We let Y I and Y I c denote Y restricted to rows I and I c , and similarly for D I and D I c . Given x * we construct a Y and a w satisfying the requirements by solving the conic program If the optimal value t is greater than 1 − , then part (1) of Theorem 3.4 is not fulfilled and we cannot show uniqueness. If instead t < 1 − we proceed to part (2). The set S can be equivalently described as the kernel of D T I c . Letting K denote a matrix of basis vectors of the kernel, any v ∈ S can be represented using a coefficient vector c such that v = Kc. We numerically test the injectivity requirement of A on S by evaluating whether AK has full column rank. If true, we have shown solution uniqueness.

Generation of sparse test images 5.1. Images for L1
The spikes class will be used to test for exact recovery of sparse images from tomographic measurements by L1 (2). Since we are interested in recovery in dependence on the sparsity, we follow the usual approach [32] and build test images consisting of a given number of spikes at random positions and with entries sampled from a uniform distribution on [0, 1]. The spikes images hence are non-negative. We also consider a signed version, signedspikes, with the only difference that entries are sampled from the uniform distribution on [−1, 1]. Figure 2 shows example images of each class at a range of relative sparsity values.

Images for ATV
For ATV, we wish to generate test images having a prescribed value of D T x 0 , where · 0 is the cardinality. Due to the operator, this is a less trivial task than in the directly sparse case. For certain operators, this can be accomplished using a technique from [33] but as pointed out in that work, the technique does not apply to the finite-difference operator D T . Here, we present two methods for this purpose. The first method, truncated-uniform, produces test images that in expectation achieve the target sparsity, while the second, alternatingprojection, produces test images of precisely the target sparsity.

The truncated-uniform class
The truncated-uniform class produces images x according to the following heuristic. Given a target number of non-zeros k of the length-N vector D T x, where D T has |B c | rows that do not correspond to differences across the image boundary, and a number F, which is the number of gray values in the image, satisfying do the following: The following lemma shows that the outcome of this method is an image that in expectation has k non-zeros after application of D T as desired: Lemma 5.1 Let x be generated by the three steps above and denote z = D T x. Then the expected value of the number of non-zero entries in z is k.
Proof With the described procedure, the jth entry of x, j = 1, . . . , n is a scalar stochastic variable X j , and the corresponding vector stochastic variable X has independent, identically distributed (i.i.d.) elements. We consider also the vector stochastic variable Z = D T X and the vector stochastic indicator variable δ with elements The total number of non-zeros in Z is described by the scalar stochastic variable of which we derive the expected value E(δ total ). Using linearity of expectation and that we get To compute P(Z i = 0) we distinguish between two cases: Z i is a finite difference either (1) across the boundary or (2) in the interior of the image. Formally, we partition the indices {1, . . . , N } into a boundary set B and a complementary interior set B c . In case (1), i ∈ B, the choice of boundary conditions (BCs) affects the probability in question. Assuming Neumann BCs, each finite difference across the boundary is zero, i.e. P(Z i = 0) = 0 for i ∈ B corresponding to a zero row in D T at indices B. Equivalently, these rows can be removed from D T , leaving B empty. Assuming zero BCs instead, the probability is instead equal to 1, since Z i can only be zero, if the pixel value in question is 0, which happens with probability 0. In this paper, we do not consider zero BCs more.
In case (2), we apply the law of total probability over the set { f 1 , . . . , f F }, whereĩ denotes the index of the pixel at which Z i is evaluated. From step (3), we know that the probability that a given pixel value is f equals ω . For the conditional probability, we note that Z i is computed as the difference between X˜i and a neighboring pixel's value.
Since the pixel values are independent, and since X˜i is given to be in the th interval, the probability of having a non-zero Z i equals the probability of the neighbouring pixel's value being outside interval , which is 1 − ω . Hence, Inserting the two cases into (27) yields Since we want E(δ total ) = k, we solve this quadratic equation for ω. The smaller of the two solutions guarantees that the sum of all widths is not greater than 1 and is precisely ω from step (1). Further, for ω to be real-valued we require (26).
For the case N side = 64, we get n = 3228 pixels within the disc-shaped mask. For ATV with Neumann BCs and keeping zero-rows of D T we have N = 2n. Due to convexity of the disc-shaped mask, we can explicitly compute |B c | = 2n − 2N side . In our numerical studies, we wish to study images sampled from the entire sparsity range between κ = 0 and κ = 2 with the maximal relative sparsity of 1.9. By taking F = 40 we can achieve images x with sparsity of k = 6169 of D T x. This corresponds to a relative sparsity of κ = k/n = 1.911. Examples of truncated-uniform images are shown in Figure 3 for a range of relative sparsity values.

The alternating-projection class
Since the truncated-uniform class consists of images of a special structure (namely, they have a prescribed number of different grey levels) it may be that they also introduce a special behaviour in the recoverability. Hence, we will consider also a different class of test images. Our goal is again to produce images x such that D T x 0 = k. We reformulate the problem as follows: Find a vector v such that v is in the range of D T and that v 0 = k. If we have found such a v, then we get a suitable test image x by solving D T x = v. For a method to construct such a v, we are inspired by the feasibility problem Although the set we are looking at is the intersection of a convex with a non-convex one, recent results indicate that an alternating projection approach may work. [34] Hence, we perform the following iteration: (1) Choose a random starting point v 0 ∈ R N ; set j = 0.
(2) Set v j+ 1 2 as orthogonal projection of v j onto rg D T . With the help of the pseudoinverse D T † of D T this is written as v j+ 1 2  (3) Set v j+1 as orthogonal projection of v j+ 1 2 onto the set { · 0 ≤ k}: Keep the largest k entries of v j+ 1 2 and set the rest to zero. If the projection yields fewer than k non-zeros, project v j+ 1 2 on a set with higher sparsity. (4) If converged, set x = D T † v j+1 ; otherwise increment j and go to step 2.
If the method converges, it is guaranteed to produce an image x with the desired properties. However, in practice, it does not always convergence. Hence, we perform a maximum number of iterations (in the range of a few thousands) and if we do not observe convergence to a feasible point v, we restart the method with a different initial point. Typically, we found that only a few restarts sufficed for producing a desired image.
We also consider a non-negative version, alternating-projection-nonneg, for which an image is generated from an alternating-projection image by shifting all pixel values by the smallest possible positive scalar such that all pixel values become non-negative.

Images for ITV
For isotropic TV, we basically proceeded similarly to alternating-projection. However, here we considered the feasibility problem where · 0,2 counts the number of rows with non-zero 2 -norm. Note that the latter set are the images x for which the Euclidean norm of the gradient has only k non-zero entries. Hence, we modify the iteration to: (1) Choose a random starting point Y 0 ∈ R r × p ; set j = 0.
(2) Set Y j+ 1 2 as orthogonal projection of Y j onto rg D T . With the help of the pseudoinverse (D T ) † of D T this is written as Y j+ 1 2 1 2 onto the set { · 0,2 ≤ k}: Keep the k rows of Y j+ 1 2 with largest 2-norm and set the rest to zero. If the projection yields fewer than k non-zero rows, project Y j+ 1 2 on a set with higher sparsity. (4) If converged, set x = (D T ) † Y j+1 ; otherwise increment j and go to step 2.
Examples of both anisotropic and isotropic alternating-projection images are shown in Figure 3 for a range of relative sparsity values. The relative sparsity values for anisotropic are chosen as twice the isotropic ones to enable a rough comparison between images from each class of 'comparable' sparsity.

Numerical experiments
As in [13], we wish to show empirically that CT image reconstruction by sparsity-exploiting methods admit sharp phase transitions as known from compressed sensing. The results in [13] only covered L1. Here, we extend to ATV and ITV and construct phase diagrams by solving the reconstruction problem as well as the uniqueness tests.

Phase diagrams for L1
In [13] it was found that reconstruction by 1 -minimization for spikes images yields a sharp phase transition. Here, through uniqueness testing we verify this result and further extend to the class signed-spikes with signed entries.
We consider images of size N side = 64 and at each relative sparsity value κ = k/n = 0.025, 0.05, 0.1, 0.2, 0.3, . . . , 0.9 we generate 100 instances. For each instance x * , we generate synthetic CT data b = Ax * corresponding to N v = 1, . . . , 32 equiangular projection views, in accordance with the scanning geometry described in Section 2.1. The system matrix A is generated by the function fanbeamtomo from the MATLAB package AIR Tools. [17] All considered matrices have between 1.5 and 2.0% non-zeros, in agreement with the estimate from Section 2.1. At N v ≤ 25 the linear system is underdetermined, while at N v ≥ 26 the system matrix A has full rank and x * is the unique solution no matter its sparsity. We therefore use N suf v = 26 as a reference point of full sampling at N side = 64 and define the relative sampling μ = N v /N suf v . For each data-set, reconstruction and uniqueness test are run. If the relative error of the computed solution x L1 w.r.t. x * is sufficiently small, i.e. x L1 − x * 2 / x * 2 < with = 10 −4 , we declare the original perfectly recovered. Figure 4 shows, for both the spikes and the signed-spikes classes, reconstruction and uniqueness testing phase diagrams: each rectangle corresponds to the relative sparsity value κ at its left and the relative sampling μ at its bottom and the colour indicates the fraction of instances perfectly recovered by reconstruction or deemed the unique solution by the uniqueness test. The phase diagrams are divided into a 'full-recovery' regime, in which all instances are uniquely recovered, and a 'no-recovery' regime, where all instances fail to be recovered/be unique. Further, the transition from no-recovery to full-recovery is sharp, in the sense that for all relative sparsity values adding 1-2 projection views changes the recovery rate from 0% to 100%. signed-spikes, right: average sufficient relative sampling point along with a 99% confidence interval at each κ-value. For both classes, reconstruction and uniqueness testing agree perfectly. For signedspikes, the average relative sufficient sampling curve is higher than for spikes meaning that more projections are needed for recovery.
For both spikes and signed-spikes, the uniqueness test phase diagrams are identical to the reconstruction phase diagrams, thereby mutually verifying correctness of each method and the attained phase diagrams.
The phase transition occurs at different sampling levels for the spikes and signed-spikes classes. This is perhaps more easily seen in the right-most plot in Figure 4 in which the average relative sampling sufficient for recovery, i.e. the smallest value of μ at which all instances of a given sparsity are recovered, is plotted for both classes along with error bars for the 99% confidence intervals. On average signed-spikes require more projection views for unique recovery. This is perhaps not surprising, as having negative pixel values can lead to negative entries in the data vector b, something that can not happen with non-negative pixel values due to the elements of A being non-negative. Nevertheless, the phase diagram reveals quantitatively how signed entries affect recoverability.

Phase diagrams for ATV
In the same way as for L1, we create reconstruction and uniqueness test phase diagrams for ATV. We consider first the alternating-projection class as well as its non-negative version alternating-projection-nonneg. As the sparsity is measured after application of D T , the relative sparsity can now be in the range between 0 and 2, and in addition to the κ values in the previous section, we include now κ = 1.0, 1.1, . . . , 1.9. The resulting reconstruction and uniqueness test phase diagrams are shown in Figure 5. As for L1, we see a partition into 'full-recovery' and 'no-recovery' regimes separated by a sharp phase transition across 1-2 projection views. The uniqueness test phase diagrams are identical to the reconstruction phase diagrams, except for a few cases in the transition region, for example for the smallest κ values for alternating-projection-nonneg. We explain these minor differences by the choice of numerical threshold for assessing recovery that is chosen a priori to a constant .
Contrary to the L1 case, we know from Corollary 3.3 that non-negativity should not change the ATV recoverability: any signed vector can be made non-negative by adding a constant vector, and the constant vectors make up the kernel of D T in the anisotropic TV case. Figure 5 confirms this experimentally, since the phase diagrams for the signed and non-negative image classes are identical.
To study whether the phase diagrams depend on the image class we repeat the experiment for the truncated-uniform image class. The resulting reconstruction and uniqueness testing phase diagrams are shown in Figure 6. Again, the two phase diagrams are identical and show a sharp phase transition from the no-recovery to the full-recovery regime. As can be seen from the right-most plot of the average relative sampling for recovery for the truncateduniform image class compared to the alternating-projection image class. The two curves are nearly identical at low and high relative sparsity values. However, in the mid-range there is a small difference corresponding to around one more projection view needed on average for the truncated-uniform images to be recovered.
Both alternating-projection and truncated-uniform image classes are constructed to yield images of a desired target sparsity, but in fundamentally different ways. The fact that the arising phase diagrams are so similar leaves us with the interpretation that the phase diagram, and in particular the phase transition curve is governed mainly by the sparsity, while the particular image class has less influence.

Phase diagrams for ITV
We repeat the reconstruction and uniqueness test study for ITV with the alternatingprojection image class designed for ITV. Recall that in contrast to the L1 and ATV cases, Theorem 3.4 only provides a sufficient condition of solution uniqueness. This means that, in principle, instances that are not shown to be unique solutions still might be.
For ITV, the relative sparsity with respect to image size is between 0 and 1. As for L1 we construct the phase diagram for the values κ = 0.025, 0.05, 0.1, 0.2, . . . , 0.9 and N v = 1, . . . , 32, see Figure 7. Due to the numerically more challenging conic programs of isotropic TV solution accuracy was smaller than for L1 and ATV and as a result we choose a the numerical threshold to = 10 −3 .
Once again we observe a partition into 'full-recovery' and 'no-recovery' regimes clearly separated by a sharp transition. Also, the reconstruction and uniqueness test phase diagram agree almost exactly and we ascribe again the minor differences to the uniform a priori choice of the numerical threshold for assessing recovery. The almost exact agreement between the phase diagrams is interesting considering the uniqueness test is only a sufficient condition. One may conjecture that the conditions of Theorem 3.4 are also necessary in case where D T is the discrete gradient and that a proof is only to be found.
Even though both ATV and ITV rely on gradient sparsity, comparing their phase diagrams do not reveal a straightforward conclusion as to which method provides the greatest undersampling potential because of the different sparsity measures. More comparisons of the two methods, for example a 'cross-over study' of ITV reconstruction applied to the ATV image class and vice versa is beyond the scope of the present work, where the goal was to simply document phase transition behaviour for CT measurements.

Computational time for reconstruction vs. uniqueness testing
The conclusion of the image being the unique solution is in some sense stronger than simply observing recovery through solving the reconstruction problem. One may therefore wonder whether the uniqueness test is substantially more expensive. We wish to demonstrate that this is not the case and for this purpose we choose a representative subset of experiments and compare the computational times of reconstruction and uniqueness testing. All timing experiments were run in MATLAB 7.13 (R2011b) under Linux using MOSEK 6.0 on a Lenovo ThinkPad T430s with Intel Core i5-3320M processor (3 MB cache, up to 3.30 GHz) and 8 GB RAM, restricted to a single core.
We choose experiments with low, medium and high relative sparsity and low, medium and high relative sampling cases to measure computational times for. For the signed-spikes class, we consider 10 instances at each of the relative sparsity levels κ = 0.1, 0.3, 0.5, 0.7, 0.9. For image size N side = 32, we use 3, 7 and 11 views, for N side = 64, we use 5, 13 and 21 views and for N side = 128 we take N v = 9, 25 and 41. For ATV reconstruction, we consider the alternating-projection image class, for the same N side = 32, 64, 128, relative sparsity κ = 0.1, 0.7, 1.3, 1.9 and the same number of views.
Results are shown in Figure 8. For reconstruction, computational time generally shows little dependence with κ, if any, increasing κ generally gives slightly increasing computational time. Uniqueness testing computational time tends to decrease with increasing κ. In several cases, the uniqueness test is significantly faster than the reconstruction. In some of these cases, the relative sampling is low and the relative sparsity is high, which causes A I (in the L1 case) to be non-injective, and the infinity-norm minimization problem needs not be solved. In other cases, for example, the L1 N side = 128 case with N v = 25, uniqueness testing is much faster across the relative sparsity range. For uniqueness testing, computational time increases with the sampling level. For reconstruction, the low sampling cases are also the fastest, however, the medium sampling case is not faster than the high sampling case in all cases, for example, in the L1 N side = 128 case the times are comparable, and in the ATV N side = 32 case, the high sampling case is in fact faster.
We conclude that in general uniqueness testing is not slower than doing reconstruction, in most cases the computational times are comparable and in some cases, uniqueness testing is in fact faster. We note that uniqueness testing can be conveniently used in case of larger κ, where reconstruction tends to be the slower option.
It is clear that the reported computational times rely on our use of MOSEK for solving the optimization problems of reconstruction and uniqueness testing. The use of an interiorpoint method is what causes the computational time to increase so dramatically from the order of 10 0 seconds at N side = 32 to 10 1 seconds at N side = 64 and 10 3 seconds at N side = 128. With another optimization algorithm shorter running times may be observed with a different result of the comparison. However, our intention with the present study is not an exhaustive algorithm comparison, but merely to demonstrate that uniqueness testing can be accomplished in the same time, or faster, than reconstruction.

Conclusion
The present work was motivated by understanding quantitatively how much undersampling is admitted for sparsity-exploiting reconstruction methods for CT given the lack of theoretical guarantees from compressed sensing. Our results demonstrate empirically that sharp average-case phase transitions from no recovery to full recovery as seen in compressed sensing also occur for CT measurements across a range of image classes and sparse reconstruction methods. The location of the phase transition, i.e. the level of sampling sufficient for recovery depends on the reconstruction method and is to a large degree is governed by the image sparsity, quite independent of the particular image class.
Due to the inherently empirical nature of our study design it is clear that our results do not imply any theoretical guarantee. Further, being average-case results leaves the chance for single instance to require more or fewer samples for recovery than predicted by the average case. Nevertheless, we think the results may be used or extended to serve as guide lines for how to many CT samples to acquire based on prior knowledge about the image class and sparsity. Natural future work would be extensions toward more realistic scenarios including noisy data, model inconsistencies, specialized image classes, etc.
Constructing phase diagrams by reconstruction cannot establish solution uniqueness, which makes the uniqueness test more desirable from a theoretical perspective. However, we observed almost exact agreement between reconstruction and uniqueness test phase diagrams, so in practice the advantage may be negligible. Also, the reconstruction approach has the advantage that it can be run directly on any reconstruction problem with no need to derive specific uniqueness conditions and as such is more easily generalizable.
In our view, the presented empirical evidence suggests that an underlying theoretical explanation of phase transition behaviour in CT may exist. Establishing such theory would have large implications for the understanding of sparse reconstruction in CT and we hope that the present results can serve as a step towards this goal.