Self-consistent gradient flow for shape optimization

We present a model for image segmentation and describe a gradient-descent method for level-set based shape optimization. It is commonly known that gradient-descent methods converge slowly due to zig–zag movement. This can also be observed for our problem, especially when sharp edges are present in the image. We interpret this in our specific context to gain a better understanding of the involved difficulties. One way to overcome slow convergence is the use of second-order methods. For our situation, they require derivatives of the potentially noisy image data and are thus undesirable. Hence, we propose a new method that can be interpreted as a self-consistent gradient flow and does not need any derivatives of the image data. It works very well in practice and leads to a far more efficient optimization algorithm. A related idea can also be used to describe the mean-curvature flow of a mean-convex surface. For this, we formulate a mean-curvature Eikonal equation, which allows a numerical propagation of the mean-curvature flow of a surface without explicit time stepping.


Introduction
Since its introduction in [22], the level-set method of Osher and Sethian has become very popular for describing evolving geometries for shape optimization and free-boundary problems. The basic idea is the following: Definition 1 Let a continuous level-set function φ : R n × [0, ∞) → R be given. For any time t ≥ 0, this function describes the evolving sets signed distance function Here, we use the notation dist(x, ∂ ) = inf y∈∂ |x − y| = min y∈∂ |x − y| and assume that ∂ = ∅. It is well-known that the signed distance function has Lipschitz constant one (see, for instance, Theorem 2.1 on page 268 of [8]). Hence, even requiring Lipschitz continuity of the level-set function φ does not imply any regularity of t . This makes the level-set method attractive for situations where it is advantageous to allow non-smooth domains.
In the present work, we want to describe and analyse methods for shape optimization based on level sets. Let us refer to [1] for a general functional-analytic framework for shape optimization with level sets and to [24] for an early work that captures already the basic ideas in a non-rigorous fashion. In order to perform optimization, we need, of course, a way to describe changes to shapes. We do this with a scalar speed method: Over time, the boundary of the evolving domain t is transformed in normal direction according to a given speed field F : R n → R. Positive speed corresponds to outward movement and a growing domain, while a negative value of F leads to local shrinking of the domain. This is illustrated in Figure 1. Note that this approach is in contrast to the classical methods described, for instance, in [8,26]. There, one usually considers a vector-valued velocity field. We, on the other hand, fix the direction of movement as the normal to the boundary because a tangential movement has no geometrical meaning. This speed method complements the level-set method very well, as it allows quite irregular evolutions. It is, in particular, possible to describe changes in topology (as we will see later in Figure 3). The effect of this transformation on the level-set function can be described by propagating φ in time with the level-set equationφ + F(x)|∇φ| = 0, φ(·, 0) = φ 0 .
Here, the initial level-set function φ 0 describes the initial domain 0 . Let us assume throughout this work that 0 = 0 ∪ 0 , that is, that 0 has empty interior. We assume, furthermore, that φ 0 and F are Lipschitz continuous. This is not restrictive in practice. Under these assumptions, one can establish the existence of a unique viscosity solution for the level-set equation (1). This is well-known, let us just refer to [5,6,11]. Furthermore, a thorough discussion of the underlying concepts for the specific situation of (1) can also be found in Chapter 2 of [19]. For our discussion below, we follow the approach described in [16] and Chapter 3 of [19]. In particular, it was shown there that the evolving sets t and t can be expressed in terms of the F-induced distance to the initial geometry: Definition 2 For x, y ∈ R n and a continuous path ξ ∈ W 1,∞ ([0, 1], R n ) with ξ(0) = x and ξ(1) = y, we define its length as dt.
The set of all such paths is denoted by X ad (x, y). Furthermore, we define the F-induced distance by The distance to the initial domain 0 is then given by We leave D(x) undefined if F(x) = 0.
With these notions, one can interpret the level-set equation (1) as the Hamilton-Jacobi-Bellman equation of a control problem. This allows to find a Hopf-Lax representation formula for φ and the evolving domains. Intuitively speaking (and ignoring the sign for a moment), D(x) is the time t it takes the evolving boundary t to arrive at some point x ∈ R n . Thus, it does not come as a surprise that one can show the following formulas: Theorem 1 Let F be Lipschitz continuous and have compact support. Assume that Proof See Corollary 5 in [19].
An important consequence is that the evolving domain t grows where F is positive, shrinks where F is negative and is stationary where F = 0. This was already mentioned above as intuitive expectation. Theorem 1 shows now that this follows, indeed, rigorously for the time evolution of the level-set equation (1). Furthermore, one can also apply the representation formula of Theorem 1 for shape-sensitivity analysis of domain functionals. In particular, let f ∈ L 1 loc (R n ) and consider The co-area formula together with Theorem 1 implies then the following statement: Theorem 2 Assume that Theorem 1 holds for t and consider j as per (2). Then holds for all t ≥ 0.
It follows immediately from Theorem 2 that t → j( t ) is absolutely continuous. By the Lebesgue differentiation theorem, this function is differentiable for almost all t ≥ 0 with the Eulerian shape derivative Note that this shape derivative matches the classical results (see, for instance, Section 2.11 of [26]). It can, however, be derived without requiring a smooth boundary of the domain t . One can even extend this result to shape-dependent integrands: Assume that our functional has now the form where f itself depends on in some way. Let us assume that t → f (x, t ) has a shape derivative f (x, t ) for any fixed x. In this situation, we can follow Corollary 7 in [19] to conclude that the total shape differential of J, corresponding to the shape derivative in direction of the speed field F, is If f depends on via one or several shape-dependent quantities, one can use this formula to justify a chain rule.
In this paper, we develop numerical methods for shape optimization based on these results. Section 2 introduces a model for image segmentation that we use for analysing and demonstrating our methods. We discuss a simple gradient-descent method in Section 3. Since this method is inefficient due to the well-known 'zig-zag behaviour' of gradient methods, we introduce a new idea in Sections 4 and 5. This leads to a method that is inherently based on the level-set framework described above. It can be interpreted as a self-consistent gradient flow and is much more efficient than gradient descent for our model problem. One can also include information from topological derivatives into this method, which we do in Section 6. In the final Section 7, we discuss how the same approach could be extended to functionals with boundary terms and describe, in particular, the relation of our ideas to the mean-curvature flow of some initial domain

A model for image segmentation
Throughout this paper, we will discuss the developed optimization methods based on a model for image segmentation. Our goal with this optimization problem is shown in Figure 2: Given a grey-scale image, we want to identify the shape of a segment, that is, a region with approximately homogeneous intensity, of the image. We do this by minimizing the following shape functional: Here, u : D → I ⊂ R is the grey-scale image. Typically, we consider D = [0, 1] 2 and I = [0, 1]. The set ⊂ D is the shape we are looking for, which should be a part of the domain D on which the image u attains an approximately constant intensity. The quantitiesū and σ in (5) are the mean intensity and the standard deviation of the image over , that is, The variable γ > 0 is a constant parameter. Thus, our goal is to minimize (5) over all possible open sets ⊂ D. We call D the hold-all domain and assume it to be bounded. Note that this approach is slightly different from the usual meaning of image segmentation. We are only trying to identify a single segment's shape, which is a problem suited very well for demonstrating our general optimization framework. In the literature, image segmentation usually means to find all segments of the image. In other words, one looks not just for ⊂ D, but instead for a disjoint decomposition of D into 1 ∪ 2 ∪ . . . ∪ N . One can, however, extend the model in (5) to include corresponding terms for D \ as well. This leads to a so-called twophase image segmentation. It is straight-forward to adapt the techniques developed below to this situation. Furthermore, using two sets 1 and 2 as optimization variables, one can even characterize four different phases. With the Four-Colour Theorem, this is enough to describe all possible (regular) image segmentations. Again, our theory and methods can be extended to this case as well. Such multi-phase image-segmentation approaches were introduced in [27]. Figure 2. Our goal of the example optimization problem considered throughout this paper is to find a shape matching one segment of a given grey-scale image. Here, the solution is shown with the red contour. It consists of the dark ring and disc.  We would also like to point out that our model (5) is not meant to be a state-of-the-art method for image segmentation. Instead, it is just a convenient example problem for the discussion of the general optimization framework we want to present. Let us now briefly interpret the chosen cost functional J: The first term in (5) is a data-fitting term that penalizes segments which do not have a nearly uniform intensity. This term is already present in the classical Chan-Vese model for image segmentation described in [3]. Instead of approximating the segment with a constant intensityū, one can also use higher-order polynomials. See, for instance, [4]. This is, again, a straight-forward extension of the method discussed in the following. Probably more interesting (and non-standard), however, is the second term in (5): Since it prefers to become larger due to the negative sign, it creates a balloon force that prevents the segment from collapsing to the empty set. The weight of this force is given by the parameter γ and, more importantly, by the standard deviation σ of the image over the segment. In other words, if the constant intensityū is a poor fit for the actual image over , we increase the force. This may happen, for instance, if the image is very noisy. In these cases, the stronger balloon force is needed to overcome the penalization imposed by the data-fitting term. When we have derived the shape derivative of J below, we will see much better how these two competing forces interact.
Our model (5) is covered completely by the shape calculus described above. Thus, we can calculate the shape derivatives according to (4). Let F be some fixed speed field. For simplicity, we denote the shape derivative in direction F again just by a prime. Recall our shape-dependent quantities: Thus, the shape derivatives are: Using these results, we can now also compute the shape derivative of J in direction F: Note that dJ( ; ·), interpreted as functional operating on the speed field F, is supported on the boundary of our domain . This corresponds to the well-known Hadamard-Zolésio structure theorem, see Theorem 3.6 on page 479 of [8]. This interpretation of the shape derivative as a functional on speed fields will be important for the gradient-descent method described in the next Section 3.
Before we come to this point, let us now also give a rough interpretation of (6): Assume that x ∈ and that F(x) > 0. This means that propagation of in direction F adds x to the domain. For this situation, the integrand's value at x tells us how J changes when a neighbourhood of x is included into the image segment. (This is related to topological derivatives, which appear naturally for our example in the shape differential (6). See the discussion in Section 6.) In particular, the sign of the integrand at x tells us whether or not x 'should' be included in the segment. This leads to the inclusion criterion In the case γ < σ, this condition is fulfilled if and only if u(x) is 'close enough' to the current segment intensityū. The right-hand side (proportional to σ ) defines the threshold that tells us how close that actually is. The larger the variance of pixels in the segment already is, the higher is also the tolerance for adding new pixels. This corresponds to our earlier interpretation of the balloon force generated by the second term in (5). If γ ≥ σ , the inclusion criterion (7) is always satisfied. In this case, the balloon force is so strong that the tracking term in (5) can never counterbalance it, no matter in how bad a way the pixel u(x) fits to the current segment.
Note that the criterion (7) really compares the standard deviation to the quadratic error term (u(x) −ū) 2 . Intuitively, it would make much more sense to compare the quadratic error to, say, the variance. One can also construct cost functionals whose derivatives produce such a criterion. They are, however, less interesting to consider as example models. Furthermore, the draft paper [13] also proposes a pixelwise condition similar to (7) in an ad-hoc way to define a postprocessing step. It was found there empirically that the standard deviation gives, indeed, much better results than using the variance in the threshold condition.

The gradient-descent method
As we have just seen, we can compute the directional derivative dJ ( ; F) of the cost functional (5) in direction of some speed field F. We would like to use this derivative to define a steepestdescent direction. When this is done, one can implement a standard gradient-descent method. See, for instance, Chapter 3 of [21] for a general discussion in the finite-dimensional situation. A remaining difficulty, however, is that the shape derivative (6) is supported on the boundary . We need, on the other hand, a speed field defined on the hold-all domain D in order to define a descent direction. Thus, one needs a suitable method to extend the integrand in (6) from onto D. This can be done in various ways, let us refer to the discussion given in Section 6.2 of [19] for more details. For the purpose of this paper, we focus on the following method: One can interpret dJ ( ; ·) as a continuous linear functional operating on speed fields F from some Hilbert space H. Using the Riesz representation theorem, we can then construct a speed field G ∈ H that corresponds to the shape derivative. We call this G shape gradient (in contrast to the shape derivative dJ ( ; ·)) and use −G as descent speed field. (This distinction between derivative and gradient is not made by all authors, but we think it helps to clarify the situation. See also part (iv) of Remark 2.34 in [14].) The choice of the concrete function space H influences the resulting descent method, since it can encode information about desired smoothing and even things like geometric constraints. We refer to [15] for a comparison of various spaces. In the following, we always choose H = H 1 (D), as this space performs well in practice. In particular, the shape gradient G can then be computed by solving the variational problem which must be satisfied for all F ∈ H = H 1 (D). The parameter β > 0 is a weighting factor that can be used to tune the smoothing properties of the inner product we use. It is often beneficial for the descent method to choose β 1, as we have demonstrated in [17]. We use β = 10 −2 for the numerical results in this paper. The variational problem (8) is solved numerically with standard finite-element techniques.
Based on descent directions constructed in this way, it is now straight-forward to implement a gradient-descent method. We employ a standard line search based on the Armijo rule, see Section 3.1 of [21]. More details can be found in Section 6.3 of [19]. Numerical examples of this gradient-descent method applied to our image-segmentation model are shown in Figure 3. The input images are artificially created and have noise added. They are plotted together with the initial (blue curve) and final (red) shapes in the left column of Figure 3. Note that a change in topology happens when the segment forms the ring in Figure 3(a). This does not disturb the method at all. The plots on the right show how the cost and gradient norm (blue and red) decrease with the descent iterations. The cost is relative to an 'exact' value that was computed for the same noisy images by using the gradient descent starting from an 'informed guess' for the initial shape. In particular, the initial segment was chosen as the shape used when generating the image itself. The gradient norm is the H 1 -norm of the shape gradient G that solves (8). The green dots show the accepted step length t satisfying the Armijo rule at each iteration. We enforce a minimum step length of t min = 10 −3 . Around iteration 50, this limit becomes active in Figure 3(b) since the final shape is already attained and the cost and gradient norm no longer decrease significantly.
While the method works well for the image in the upper row, note that this is no longer the case for the image with sharp edges shown in Figure 3(c). Even though the descent in Figure 3(d) runs over many more iterations than the one of Figure 3(b), the descent gets stuck very quickly and hits the step-size limit at t = t min without actually reaching the final solution. The lower line of green dots in Figure 3(d) is t min , as before. The second line above it is at 2t min , which is the first step size tried after a step with t min has been taken. In many iterations, the enforced step length makes the situation worse, so that the following step is immediately accepted and 'corrects' this again. This leads to these distinct two lines of green dots. In order to understand why the gradient descent performs so poorly with sharp edges, it is useful to consider the shape gradients that occur when the descent is stuck. The speed fields computed for two consecutive iterations are shown in Figure 4. If one looks at even more consecutive steps, one finds that the whole iteration sequence is a back-and-forth between situations similar to these two. In particular, where the current shape is already at the image's edge, the boundary is repeatedly moved over the edge and back. This can be seen clearly in the figure: The speed is negative (blue) in all these regions in Figure 4(a) and positive (red) in Figure 4(b). The front is only moved consistently forward where it is still in the process of closing around the hole. A consequence of this behaviour is that each step is forced to be very short by the line search, so that the descent is not able to progress any further.
In fact, this kind of zig-zag movement is a general (and well-known) 'feature' of gradientdescent methods. A possible solution for this problem is the use of second-order methods. In our particular situation, we would like to weight the speed such that it is decreased accordingly at image edges. This is precisely what a Newton-type method would do, since the Hessian would, roughly speaking, consist of the normal derivatives of the image data. This means that the gradient would be divided by a large value where the shape is close to an edge and by a small value elsewhere. The drawback of such a method, however, is that derivatives of the image data are required. For a noisy image, they are not easy to evaluate in a robust way. Because of this, we propose an alternative idea to improve the gradient descent in the following.

Self-consistent speed fields
Let us now describe an alternative strategy for the shape optimization of (5) that avoids the inefficient zig-zag movement of the gradient descent discussed above in Section 3. This is a completely new idea, which can be interpreted as a gradient flow. At the heart of this approach, there are two crucial observations about the shape evolution of some initial geometry 0 along a given speed field F: • Recall that the value F(x) of the speed field at some position x ∈ R n defines the normal speed of movement of the boundary at this point x. This, however, means that the value of F(x) is only significant at the instant t in time when x ∈ t . • If F > 0, then the evolution of is monotone. In particular, our domain always grows. This implies that each point x ∈ 0 is reached by the advancing front at a precisely defined, unique arrival time. In other words, for each such point there is a unique t ≥ 0 such that x ∈ t . For all τ < t, it follows that x ∈ τ ∪ τ . If, on the other hand, τ > t, then x ∈ τ . This t is, in fact, given by the distance D(x) of Definition 2.
Particularly interesting is the following conclusion, which can be drawn by combining both observations: If we are given some monotone shape evolution, then a single speed field is enough to encode the propagation for all times. This is even true if the shape evolution is defined in terms of multiple speed fields (e.g. descent steps) or with a time-dependent speed. (The latter situation has not been discussed above, but appears sometimes in a different context.) Assume that we have an optimization problem whose shape derivative has the form for some shape-dependent function f. We have already seen in (6) that our image-segmentation problem is of this type. Note, however, that we will have to make some assumptions on f later on for Theorem 3. These assumptions are usually not fulfilled for the image-segmentation problem. Nevertheless, the method still works very well in practice, as we demonstrate in Section 5 below. For the following discussion, let us assume at least f < 0 already now. Because we choose the speed field as −f in Definition 10 below, this ensures that the resulting shape evolution is always monotonically growing. The idea can be adapted in a straight-forward way for f > 0 and a monotonically shrinking domain as well. If the sign of f is not fixed (which is usually the case in optimization since we are looking for a zero of the gradient), the approach still works in practice (see Section 5). Now, in order to solve the optimization problem, we are interested in a speed field that has the following property: for all t ≥ 0 and x ∈ t . Here, t is the time evolution of 0 with respect to the speed field F itself.
Clearly, F defined in this way is a descent direction not only at = 0 but all along the time evolution, that is, for all = t with t ≥ 0, as the shape derivative (9) is negative by definition. In fact, we have (in some sense) chosen F to be the negative shape derivative. This is the same idea that was already used in [24]. It motivates the claim that this speed field corresponds, somehow, to a gradient flow. It has to be noted, however, that it does, in general, not correspond to the shape gradient introduced in Section 3 as the Riesz representative of the shape derivative. Another important issue is the following: In order to make sense of the condition (10), we already have to know some shape evolution on which t and t can be based. In other words, one already needs a speed field in order to apply (10)! In fact, for Definition 10 to be fulfilled, this condition needs to hold assuming the shape evolution induced by the speed field F itself. This is the reason for calling it 'self-consistent'. Thus, (10) cannot be used directly to compute such a self-consistent gradient flow. What we can do, however, is to define an iteration: Given an initial speed field F 0 , we can, indeed, use (10) to define another speed field F = ψ(F 0 ). The self-consistent gradient flow that we are looking for is then a fixed point of ψ. With certain assumptions, we will see in Theorem 3 that such a fixed point exists and the iteration converges to it. The computation of F = ψ(F 0 ) is depicted in Figure 5: For some fixed x ∈ 0 , we compute t = D(x). Next, the evolved shape (x) = t at this time is found. It corresponds to the snapshot in the shape evolution when x lies precisely on the advancing front t . As discussed above, the speed field F 0 is used to compute D and the shape evolution. When this is done, we set according to (10). This means that we evaluate the shape dependence of f for the domain (x).
(Note that the value of F does not matter for x ∈ 0 . Since we assumed a monotonically growing shape, changes to the speed field in 0 will not influence the shape evolution at all.) Of course, it is not possible in practice to compute all of the evolved shapes (x) for x ∈ R n . Doing so would, roughly speaking, correspond to performing a gradient descent with infinitesimally small time steps. Even the computation for all points of a discrete grid has a prohibitive computational cost and cannot be done practically. One can, however, compute just a handful of snapshots together with the corresponding shape-dependent quantities. The shape dependence at some point x can then be found by interpolating these values for the desired time D(x). The qualitative behaviour of the important shape-dependent quantities in our case is plotted in Figure 6(b) for a sample time evolution. This shows that they behave 'nicely', which means that it is, indeed, justified to apply some suitable interpolation method in practice. Note that this is the case even though neither the speed field nor the image are actually continuous for the example (due to the added noise). In fact, one can show that certain assumptions ensure that the shapedependent quantities are actually Lipschitz continuous with respect to the evolution time t. See Subsection 7.2.3 of [19] for more details. Thus, there exists also a theoretical justification for the suggested interpolation approach. Hence, we are perfectly able not only to define ψ(F 0 ), but also to compute it without too much difficulty.
The proof for existence of a fixed point of ψ is quite technical, so that we refer to Section 7.2 of [19]. The main idea is the following: One can derive a Lipschitz estimate on ψ(F) − ψ(G) ∞ (a) (b) Figure 6. Behaviour of the shape-dependent quantities that appear in dJ ( ; F) of the image-segmentation model (see (6)) during a sample time evolution. The initial domain and the applied speed field are shown on the left. An affine transformation has been applied to make the values on the right comparable to each other.
in terms of the difference F − G ∞ . Furthermore, the Lipschitz constant can be made arbitrarily small by restricting the domain to a band around the initial geometry 0 . Here, F > 0 is a minimum speed value we enforce on f (see below). Thus, by choosing t > 0 sufficiently small, one can always guarantee that ψ is a contraction mapping on a suitable subset of C(E t ). This yields existence of a fixed point with Banach's fixed-point theorem. Before we can state this as a theorem, we have to define a certain notion of geometric regularity which is inspired by our work in [20]: Definition 4 We say that the initial set 0 has uniform lower density if there exist c ∈ (0, 1) and t 0 > 0 such that ) holds for all t ∈ (0, t 0 ) and x ∈ 0 .
This property forbids, roughly speaking, outward-pointing kinks of the domain 0 . Having uniform lower density of 0 , we can use Theorem 4 in [20] to estimate the volumes of evolved sets t \ 0 based on the perimeter of the initial set 0 . These estimates, in turn, yield certain Lipschitz properties of shape-dependent quantities. The main theorem about the existence of a self-consistent gradient flow is Theorem 25 in [19], which we state here for convenience:

Theorem 3 Let the integrand f in (9) depend on only via shape-dependent quantities that can be expressed as domain functionals of the form (3). Furthermore, assume that • f is Lipschitz continuous (with respect to x and the values of all shape-dependent quantities)
and 0 < F ≤ −f ≤F, • 0 has uniform lower density, and • the perimeters of evolved sets t can be bounded uniformly if t is small enough.
Then there exists a time t > 0 such that ψ has a unique fixed point F * on E t . Iteration with ψ converges towards F * starting from, in particular, any constant speed field with value in [F,F]. All iterates and F * itself are Lipschitz continuous. The speed field F * is a self-consistent gradient flow according to Definition 10 for times up to t.

Numerical realization as a multi-step procedure
Following up on the description in Section 4, let us now discuss numerical computations based on the idea of self-consistent gradient flows. In our implementation, the hold-all domain D is discretized with a uniform grid. This also applies to functions defined on D, such as speed fields and the level-set functions used to encode shapes. This approach is particularly well suited to the image segmentation problem, since also the input image u is usually given by pixel values on such a rectangular grid. For evaluating ψ on a speed field F, we follow the description given above in Section 4. In particular, we start by computing the distance function D on all grid points. This can be done efficiently using Sethian's Fast Marching Method [25]. Let us remark that this is the step in the computation where the underlying speed field F enters. Based on D, we can then find evolved shapes for arbitrary times t very cheaply by using the representation formula of Theorem 1. As the next step, we have to select a couple of reference evolution times. For them, we evaluate the corresponding evolved shapes and shape-dependent quantities. This allows us to approximate these shape-dependent quantities at arbitrary times t by interpolating between the reference times. To compute ψ(F) at some grid point x, it remains to do this interpolation for t = D(x).
In order to extend this algorithm for evaluating ψ to a complete optimization procedure, let us now recall two important difficulties: First, the result of Theorem 3 is only local in nature. To work around this issue, one can, however, apply the theorem repeatedly to layer multiple bands around the initial geometry and each other. This allows one to extend the domain where a fixed-point exists, as described in Section 7.3 of [19]. Instead of trying to really find a single self-consistent speed field for the whole descent, we perform multiple steps: (1) Choose a step length t 0 that is small enough. For our purposes, it was always enough to consider t 0 simply as a fixed parameter in the algorithm. It could, however, also be chosen based on some convergence criteria if necessary. (2) Choose some initial speed field F 0 . A possible choice is the steepest-descent speed under the assumption that all shape-dependent quantities are constant, that is, (3) Evaluate ψ(F 0 ) numerically on a suitable neighbourhood of 0 . Iterate ψ a few times until a fixed point F * is found approximately. (4) Propagate 0 along F * up to time t 0 . Repeat the procedure with the new set as 0 .
The second difficulty to keep in mind is that we have assumed 0 < F ≤ F ≤F when showing Theorem 3. In the context of an optimization algorithm, we actually expect the speed field to converge to zero when the shape approaches a critical point. Furthermore, it is very restrictive to assume that the shape only grows throughout the optimization descent. This assumption can be violated even if the initial domain is a subset of the final optimal shape. We will demonstrate this situation on a simple example in Subsection 5.1. In this example, the real gradient flow is such that certain points x ∈ 0 come to lie twice on the boundary of the evolving domain during the whole propagation: Once when they are temporarily added to the current shape, and once when they are removed from it again later on. It will turn out, however, that these situations pose no problem to the multi-step optimization strategy described above. Each step is, of course, limited to monotone behaviour in the sense that each point can, at most, either be added to or removed from 0 at a single instant in time. During the course of multiple steps, however, the 'direction of monotonicity' may change. This is another reason in favour of a multi-step approach with a finite step length t 0 for each iteration. We will see below that it does, indeed, work very well for practical shape optimization.

Demonstration in 1D
Before we turn our attention back to the image-segmentation problem of Section 3, let us first consider a modified version in only one dimension. This simplified problem allows us to demonstrate some basic properties of our multi-step gradient-flow algorithm. In particular, we want to minimize the cost function This is similar to (5) of our image-segmentation model, but note that u 0 is assumed to be independent of the shape here. Also the form of the balloon force with vol ( ) is changed. Similar to Section 2, we can easily compute the shape derivative of (12) to be In other words, moving across some point x is beneficial if and only if From this equation, we can clearly see that this tolerance for adding points to the domain decreases if the volume of grows. This is a main feature of the new balloon force in comparison to the one used in the original image-segmentation model (5). Shape optimization itself simplifies a lot if only one space dimension is used. If, in particular, the initial domain 0 = (a 0 , b 0 ) is an interval, it is enough to track the movement of the interval's boundary points in time to describe the shape evolution. In other words, we only have to solve a two-dimensional ODE instead of the level-set equation (1) to compute the domain's time propagation:ȧ with initial values a(0) = a 0 and b(0) = b 0 . This is, of course, much easier. It even allows us to compute the exact gradient flow just by using an ODE solver. The speed field corresponding to the steepest descent depends only on the current shape's volume vol ( t ) = b(t) − a(t) and follows from (13). In particular, we have Let us simplify the problem even further: We assume that the 'image' u is fixed as u(x) = x. For this situation, we expect that the optimal shape is given by an interval (a * , b * ) = (u 0 − δ, u 0 + δ) symmetric around u 0 . The width 2δ depends on the parameter γ and arises when the error term (u − u 0 ) 2 = δ 2 and the balloon force γ /(b − a) 2 = γ /(2δ) 2 are in equilibrium. This is the case for The numerical results for the choice of γ = 5 · 10 −2 and the symmetric initial shape (a 0 , b 0 ) = (0.4, 0.6) are shown in Figure 7(a). In this situation, the shape grows monotonically until it reaches the optimal interval indicated by the vertical lines. The blue line shows the shape evolution according to the self-consistent gradient flow. It matches the exact gradient flow (green line) almost perfectly. For this situation with monotonic growth, only a single step of the procedure outlined above is necessary. The evolving shape converges to the optimal interval for t → ∞. Let us also show that self-consistency really makes a difference: The red line in Figure 7(a) corresponds to the speed field chosen as the steepest descent (14), but with the shape dependence (i.e. the volume b − a) fixed at the optimal shape. In this case, we still see convergence towards the optimal shape, but in a clear contrast to the desired gradient flow. Also note that convergence to the optimal shape itself stems only from the fact that we have used precisely the optimal shape's volume. This is something that can, of course, not be done in a real computation when the solution is not already known a-priori. It is also interesting to consider the same example with an initial interval that is not symmetric around u 0 . The result for the choice (a 0 , b 0 ) = (0.2, 0.4) is shown in Figure 7(b). As before, the exact gradient flow is depicted in green. One can clearly see that the lower boundary a(t) behaves non-monotonically in this situation: When t is small at the beginning, the balloon force is strong enough to grow the interval at both boundaries. This is even true after the lower boundary crosses over the equilibrium value a * . However, at some point, the expansion due to the upper boundary decreases the force until the lower boundary starts to increase again. For t → ∞, both boundaries converge towards their equilibrium values from below. Note that this phenomenon occurs even though the initial interval is a strict subset of the optimal shape. From a naive point of view, one might have expected a monotonic growth towards the optimal shape. To handle this situation with our method, we need at least two steps. The blue curve shows the result with a length of t 0 = 3 for the first step. One can nicely see that the evolution according to the self-consistent speed field matches the exact gradient flow initially as before. However, as soon as the lower boundary would have to turn around, it gets stuck instead. With the second step, the direction of the lower boundary is reversed and it converges, as expected, to a * from below. This shows that the multi-step procedure does, indeed, work well also for difficult situations with non-monotonic behaviour.

The image-segmentation problem
Next, we apply the multi-step self-consistent gradient flow to the full image-segmentation problem of Section 3. For the examples in this subsection, we always use a fixed step length of  Figure 8. The first five steps of the evolution for an image without sharp edges are illustrated in Figure 8(a). The result is already very close to the final shape. Compare also the cost decrease in Figure 8(b) to Figure 3(b): For the gradient descent, roughly 50 steps are required to bring the cost down to 10 −5 . The same decrease is realized by the gradient-flow method in only a tenth of this number of steps! Of course, each step for the gradient flow is more expensive as it involves multiple fixed-point iterations with ψ. For this example, however, four iterations per step are enough to reach a point where an increase in the number of fixed-point iterations does not cause any noticeable difference in the result anymore. Thus, even if we take the fixed-point iterations themselves into account as well, it still requires only 20 'operations' to converge to the final shape. The gradient flow is also clearly more efficient in terms of computation time.
If one compares the resulting shapes between Figures 3(a) and 8(a), a striking difference lies in the regularity of the boundary. The gradient descent produces a much smoother boundary than the gradient flow. To understand this effect, note that the problem itself (i.e. the cost function in (5)) does not include any regularization for the boundary at all. Since our input images contain noise, the natural result of the optimization procedure is thus an irregular boundary as seen in Figure 8(a). The smoother boundary of Figure 3(a), on the other hand, is a result of the computation of the shape gradient by solving the regularizing equation (8).
Let us now do the same comparison for the image of Figure 3(c), which has sharper edges. The gradient-descent method does not work for this image, and the desire to find an alternative method was our main motivation for the work on the gradient-flow idea after all. This, of course, leads to the question whether the gradient-flow method performs better for this image. The answer can be seen in Figure 8(c): It does. It requires more descent steps (the first six are shown in green), but it still shows convergence to the desired shape. The situation is also slightly more delicate with respect to the fixed-point steps, since the sharp edges lead to larger Lipschitz constants for the shape-dependent quantities. For the shown result, we used 10 fixed-point iterations for each descent step. Nevertheless, it is clear that the method works quite well, while we have seen in Figure 3(d) that gradient descent simply fails for this image due to excessive 'zig-zaging'.

Incorporating topological derivatives
One of the major advantages that level-set methods have over other strategies is their flexibility with respect to topological changes. This was already discussed above and can be seen, for instance, in Figure 3(a). However, even though our speed method is, in theory, able to change the topology, there are often situations where these changes are not actually performed in practice. Often, the gradient descent or the gradient flow discussed above avoid topology changes and converge instead to a local minimum with the same topology as the initial geometry. A main issue here is that the speed method is based on transforming the boundary of the current domain. Thus, it is neither able to create holes in the interior of nor new components of the domain in R n \ . A common method to overcome this difficulty is to use an initial geometry with lots of components and holes. During the optimization process, unnecessary holes and components can then disappear or join together. See, for instance, [3,28].
For more systematic approaches, a concept called the topological derivative can be used. See, for instance, the methods described in [2,23]. In particular, this quantity characterizes how the cost changes when new components of the domain are created. (It is also often defined as the change of the cost when holes are punched into , but the basic idea is the same. For our discussion, we will concentrate, for simplicity, on the situation of new components.) This is in contrast to the shape derivative discussed so far, which describes the change of the cost when the boundary of is moved. For some x ∈ R n \ , a simple version of the topological derivative can be formally defined as The definition for x ∈ and the creation of holes can be done in a similar way. While we do not want to discuss technical details here, let us still describe, at least in an informal way, how topological derivatives can be incorporated into our gradient flow. This results quite naturally in a hybrid method that performs much better for some problems. For our situation (with only domain functionals), the topological derivative actually matches the shape derivative in some sense. In particular, if the shape derivative is given by (9), then the topological derivative at some point x is A heuristic argument to see why this is the case is the following: Consider the situation that x ∈ and F(x) = 1. The contribution of the particular point x to the shape derivative (9) is then precisely f (x, ). Furthermore, F(x) > 0 with x ∈ means that, from a local point of view, x is being added to . This, in turn, is roughly just the meaning of the topological derivative at x. Now, assume that we have found a self-consistent gradient flow F according to Definition 10. In this case, x); x). Instead of evolving 0 along F to get the Figure 9. The self-consistent speed field and the initial geometry for the image-segmentation example discussed in Section 6.
next iterate 1 of the geometry, we can instead use the following idea: Whenever F(x) > 0, this means that the topological derivative is negative at x. Consequently, it is beneficial for decreasing the cost to include those x into the domain. Similarly, we want to exclude all x where the topological derivative is positive. Thus, we can simply define This can be achieved by just interpreting −F as the level-set function for 1 . Depending on the problem at hand, one can build a suitable heuristic to decide when to use (15) instead of a shape evolution. For instance, whenever the descent seems to be converged, one can try a step with (15) to check whether this is really the case or it is just stuck in a local minimum. This approach matches the suggestion made in Subsection 3.4 of [23].
To conclude this section, let us demonstrate this idea on a simple example. We use the imagesegmentation problem depicted in Figure 10, with the initial geometry 0 shown in blue. For this situation, both the gradient-descent and the gradient-flow method lead to a suboptimal shape that does not exclude the centre. The corresponding speed field resulting from our fixed-point iteration is plotted in Figure 9. On can clearly see that it (and thus the topological derivatives) are negative where the hole needs to be created. The geometry that results by evolving 0 along the speed field is shown in Figure 10(a). In contrast, the domain according to (15) can be seen in Figure 10(b). For the situation at hand, the latter is the superior solution.

Boundary length and mean-curvature flow
In the last section of our paper, we would like to discuss how the idea of 'a single speed field for the whole evolution' can also be applied to other problems. In particular, an important generalization of the model problem (5) includes the perimeter of as an additional regularization term. This prevents the effect of an irregular boundary that we have seen for the examples in Figure 8. Going even further, if the cost function is just the perimeter, it is well-known that the shape evolution corresponding to the gradient flow is the so-called mean-curvature flow. It has been widely studied both theoretically and from the point of view of applications. See, for instance, Figure 10. The shapes that result after taking one step based on the speed field in Figure 9. We use a shape evolution of 0 according to the speed field on the left, the new shape in the right plot is defined via (15). [12]. The description of mean-curvature flow with level-set methods is discussed, among others, in [11,22]. In particular, the normal speed F is given by the (negative) mean curvature κ of the evolving surface. The mean curvature can be computed from a sufficiently smooth level-set function φ by the expression (For a derivation, see (1.4.9) on page 29 of [11].) Thus, we can modify our level-set equation (1) to read: While this equation has a singularity at ∇φ = 0, one can still establish a suitable solution theory based on viscosity solutions. This was pioneered by Evans and Spruck in a series of papers starting with [10]. It is also discussed in great detail in [11]. For numerical methods, we refer to [7]. Our discussion is focused on a new approach instead: We would like to compute the meancurvature flow based on an Eikonal equation similar to Sethian's Fast Marching Method [25] and the approach taken in [16]. This method, of course, has a major drawback: Since it is based on the idea described in Section 4, it only works as long as the evolution of the shape is monotone. This is, for instance, the case for mean-convex geometries (meaning that κ is positive all over the surface). However, also many interesting cases are excluded by that requirement. On the other hand, we believe that our new approach also has advantages that make it, nevertheless, interesting to study: First, it may lead to more efficient methods for the computation of the mean-curvature flow if more algorithmic research is devoted to this topic. Second, a description of the meancurvature flow in terms of an Eikonal equation allows to apply new analytical tools similar to the new results that could be derived in Chapter 4 of [19] based on the Hopf-Lax formula given in Theorem 1. And third, this can be used as a starting point to apply self-consistent gradientflow methods to problems which have regularization terms based on the boundary length. Here, one can hope that the restriction to monotone evolutions can be circumvented with a multi-step procedure similar to the one demonstrated in Section 5. This idea will be discussed briefly at the end of this section and is also an interesting area for further research.
The basic idea of our method is the same as described in Section 4: We want to define a speed field corresponding to the steepest descent and make sure that it is self-consistent with respect to shape-dependent quantities. For the case of mean-curvature flow, we have to set F(x) = −κ(x).
As before, this value depends on the shape (x). In contrast to the situation analysed above, however, the shape dependence is local this time: The curvature depends only on the shape of the boundary in a neighbourhood of x. It does not depend on things like the volume of or other global quantities. Let us now assume that some speed field F is fixed. We use Theorem 1 to describe the shape evolution. Excluding the stationary case F(x) = 0, this implies that a level-set function for the domain evolution is given by Note specifically that the spatial dependence of φ is precisely the term D(x). Thus, the mean curvature κ(x) can be computed from D alone without any time-dependent terms. In particular, (16) implies One can also show (compare Lemma 36 in [19]) that the distance D depends on the speed field F via the Eikonal equation Consequently, we can conclude for a self-consistent situation: − κ|∇D| = −div ∇D |∇D| · |∇D| = 1.
In other words, the distance that defines the shape evolution for the mean-curvature flow solves the mean-curvature Eikonal equation (20). Let us refer to Subsection 7.3 of [10] for a discussion of this equation. A possible way to solve (20) numerically is to introduce a pseudotime and evolve towards a stationary situation. For this, we introduce an artificial time dependence of D and propagate the parabolic equation , D = 0 on 0 in time (denoted by the variable τ ). We can hope that it converges towards a stationary state D ∞ for τ → ∞, which then solves (20). Note that this equation is the same as the mean-curvature level-set equation (17) except for the additional forcing term. Thus, we can apply the methods of [7] to solve it: As a first step, let us introduce > 0 as a small regularization parameter. In order to avoid singularities for ∇D = 0, we consider the regularized equation We solve this equation with standard finite elements and a semi-implicit time-stepping scheme. In particular, let D i = k c i k u k be the finite-element discretization of D at some time t i . Here, (a) (b) Figure 11. Solution of the mean-curvature Eikonal equation (20) for an elliptic initial geometry. The resulting distance D is shown on the left. The right plot shows the relative L 2 -difference between consecutive time steps during the pseudotime evolution.
(u k ) k is the finite-element basis and (c i k ) ik are the time-dependent coefficients. With we see that the discrete version of the weak form of (21) must hold for all test functions v. To simplify notation, let us assemble all coefficients (c i k ) k to a vector c i and define The matrices M and K can be computed by scaling the standard mass and stiffness matrices accordingly. For linear elements, ∇D i and thus also the weights are constant on each mesh triangle. Then, each time step corresponds to solving the linear system Let us demonstrate the viability of this method: We use an elliptic initial domain 0 and solve (20) on it with the method described above. The resulting distance D, which is the stationary state of (21), is shown (inside of 0 ) in Figure 11(a). Its contour lines correspond to the shape at various times (as per (18)). One can clearly see that the shape first turns into a circle and later vanishes. This is the expected behaviour according to [12]. The relative difference in L 2 -norm between consecutive pseudotime steps is plotted in Figure 11(b). This clearly shows that the pseudotime iteration converges, indeed, to a stationary state and thus a solution of (20).
To conclude this discussion, let us give a brief outlook how the shape-optimization idea of Section 4 can be combined with the mean-curvature approach: A combined method promises a way to solve shape-optimization problems that include perimeter terms as regularization. Let us, again, consider a base problem whose shape derivative is of the form (9). If we add P( ) as additional term to the cost, the new shape derivative is dJ( ; F) = F(f (x, ) + κ(x)) dσ .
Thus, the steepest-descent direction is As before, κ depends on the distance D, which, in turn, can be computed from the initial speed field F 0 by solving (19). In order to find a self-consistent gradient flow, we now have to find a solution (F, D) to the mixed algebraic-differential equation Here, ψ denotes the fixed-point iteration mapping without perimeter regularization as it was discussed above.