Estimating parameters with pre-specified accuracies in distributed parameter systems using optimal experiment design

ABSTRACT Estimation of physical parameters in dynamical systems driven by linear partial differential equations is an important problem. In this paper, we introduce the least costly experiment design framework for these systems. It enables parameter estimation with an accuracy that is specified by the experimenter prior to the identification experiment, while at the same time minimising the cost of the experiment. We show how to adapt the classical framework for these systems and take into account scaling and stability issues. We also introduce a progressive subdivision algorithm that further generalises the experiment design framework in the sense that it returns the lowest cost by finding the optimal input signal, and optimal sensor and actuator locations. Our methodology is then applied to a relevant problem in heat transfer studies: estimation of conductivity and diffusivity parameters in front-face experiments. We find good correspondence between numerical and theoretical results.


Introduction
Accurate estimation of key physical parameters in a system is an important problem. We mention some examples: a material can be characterised by its conductivity and diffusivity constants in heat transfer studies (Gabano & Poinot, 2009), realistic groundwater contamination simulations require accurate estimates of diffusivity and advection constants (Wagner & Harvey, 1997;Yeh, 1986), permeability and porosity of rock aid in oil extraction from subsurface reservoirs (Mansoori, Van den Hof, Janssen, & Rashtchian, 2014), etc. In this context, we consider in this paper the problem of optimally designing the identification experiment leading to the estimates of these physical parameters. More particularly, we design the least-intrusive excitation signal that nevertheless leads to parameter estimates with variances that do not exceed certain given (user-chosen) limits. Physical systems can have different structures. In this paper, we are particularly interested in those systems that can be described by linear partial differential equations (PDEs) with spatially independent coefficients.
Such systems are characterised by equations that not only contain time derivatives but also spatial ones. In the System Identification literature, they are usually referred to as distributed systems. The phenomena described by such equations are quite pervasive in CONTACT M. G. Potters m.g.potters@tudelft.nl the physical world (convection, diffusion, diffusionadvection-reaction, wave phenomena). Consequently, it is of importance to be able to design experiments that will allow identification of physical parameters in those systems in an accurate manner. Unfortunately, as their dynamics are described by PDEs, the classical optimal experiment design 1 techniques that have been developed for systems described by ordinary differential equations (ODEs) cannot be directly applied (see e.g. Bombois, Scorletti, Van den Hof, & Hildebrand, 2006;Jansson & Hjalmarsson, 2005). The classical approaches will, therefore, have to be adapted. This is one of the contributions of the present paper. Moreover, the particular structure of the systems described by PDEs allows us to analyse an additional design aspect that is usually not considered in optimal (least costly) experiment design: the location of the actuator that will excite the system and the location of the sensor that will measure the output of the system for the purpose of identification. Indeed, as mentioned in the recent book of Uciński (2004), most literature on optimal sensor and actuator location in distributed systems deals with state estimation, but few works actually address parameter identification. Yet, finding such locations can greatly improve the accuracy of the estimates, as shown in Rensfelt, Mousavi, Mossberg, and Söderström (2008). This paper addresses the problem of finding the optimal sensor and actuator locations as well as finding the optimal spectrum of the input signal. Before addressing optimal experiment design for systems described by PDEs, let us first discuss how we will perform the identification of the physical parameter vector θ 0 of such a system. Like all physical systems, the systems described by PDEs are continuoustime systems. Since we assume linearity, the relation between the continuous-time input and output is given by a continuous-time transfer function G(s, θ 0 ) in the Laplace variable s (θ 0 appears explicitly in G(s, θ 0 )). However, for systems described by PDEs, this continuoustime transfer function is not rational in s (it can be, for example, G(s) = cosh( √ s)). A closed-form expression of G(s, θ 0 ) can be derived if the PDE is analytically tractable, although this is in general not possible for complicated (high-order, coupled) systems. Because the data that will be used for the identification are discrete, we need a discrete-time representation of G(s, θ 0 ) that is also explicit in θ 0 . However, such a representation does not exist in practice (it would be of infinite order). To circumvent this problem, spatio-temporal discretisation is generally applied and yields a finite-order approximation G(z, θ 0 ) of the discrete-time transfer function between the discrete-time input and output data. The approximation consists of dividing the spatial dimension into a finite number of intervals in which the states of the systems are supposed constant. The order of G(z, θ 0 ) is then related to the number of intervals in the grid. This spatio-temporal discretisation yields a transfer function that is still explicit in θ 0 . Different discretisation schemes exist. In this paper, we propose to use the Crank-Nicolson stencil, which is unconditionally stable, and also ensures that the finiteorder approximation G(z, θ 0 ) is stable. Once we have the description of the system in the form of the transfer function G(z, θ 0 ), it is straightforward to use the input-output data to identify the parameter vector θ 0 using predictionerror techniques.
A second method to simulate/identify the system explicit in θ exists. When the PDE is analytically tractable we can make use of the linearity of the system to calculate the system response (Ljung, 1999). However, this method is only applicable for an input signal that is a superposition of sines.
These two approaches are not the only ones possible to identify the physical parameter vector θ 0 . Rational or fractional black-box model can also be first identified and then the physical parameters be deduced from the parameters of the black-box model (see, for instance, Aoun, Malti, Levron, & Oustaloup, 2004;Gabano & Poinot, 2001;Point & Trigeassou, 2003;Point, Trigeassou, & Lin, 2002). However, these approaches require models with many parameters that are implicitly coupled to the physical ones. As such, the identification procedure will be numerically heavy. If the continuous-time transfer function G(s, θ 0 ) can be expressed in closed form, frequency-domain approaches can also be used to identify θ 0 from the collected data (see, for instance, Pintelon, Schoukens, Pauwels, & van Gheem, 2005). Recently, a nice instrumental variable method has also been proposed in Schorsch, Garnier, Gilson, and Young (2013). However, we have chosen the approach via G(z, θ 0 ) since it is the most general, the most straightforward and necessary for optimal experiment design. Now we have defined our identification method and we have an expression of G(z, θ 0 ) (which in general is also a function of the sensor and actuator locations, or other design variables), we can use classical optimal experiment design techniques (Bombois et al., 2006;Jansson & Hjalmarsson, 2005) to optimally design the input signal for the identification of the physical parameter vector θ 0 . The to-be-designed optimal signal has to be parameterised. Generally, it is parameterised as a superposition of sinusoids (e.g. a multi-sine) or a filtered white noise. These parameterisations make the optimal experiment design problem convex and finite dimensional. The transfer function G(z, θ 0 ) being generally of large order, it is more practical to parameterise the to-be-designed optimal input signal as a multi-sine (with fixed frequencies, but free amplitudes). Indeed, in this particular case, only the frequency response of the gradient of G(z, θ 0 ) with respect to θ 0 is required for optimal experiment design. (Note that to calculate this gradient, we indeed need G(z, θ 0 ) to be explicit in θ 0 .) In the case where a closedform expression of G(s, θ 0 ) exists, we also propose a simpler approach. We indeed use the property that the frequency responses of G(s, θ 0 ) and of G(z, θ 0 ) are equal in the limit when the sampling time goes to zero, and almost equal for small sampling times. The frequency response of the gradient of the usually simpler continuous-time transfer function G(s, θ 0 ) can then be used in the optimal experiment design procedure. The approach above can be applied for each sensor/actuator location in a very easy way and the optimal experiments for each location can be compared, allowing to determine the optimal locations.
We apply our methodology to one-dimensional (1D), second-order linear PDEs with spatially independent coefficients. Diffusion-advection-reaction processes in real life can be modelled with this family of equations. We stress that our methodology is applicable to higher dimensional, higher order PDE systems with different boundary conditions (as long as a discrete-time transfer function between input and output can be determined). We introduce and scale the continuous-time physical models in Sections 2 and 2.1. The unscaled physical model represents the (continuous-time) true system which will be used to identify the physical parameters θ 0 with the use of our optimal input signal. The scaled model will be used for optimal input signal design and in the identification procedure. This procedure, together with the generation of discrete-time input and output signals, is explained in Section 3. The identification procedure requires simulation of the output as a function of θ and is introduced in Section 3.2.1. The experiment design framework is explained in Section 4 and shows how to generate the optimal input signal for given choice of sensor and actuator locations. We generalise the OED framework in Section 4.2, where now also optimal sensor and actuator locations are computed. In Section 5, we apply our methodology to a diffusion process in which two material properties are identified with a front-face experiment.

Diffusion-advection-reaction processes
The diffusion-advection-reaction equation typically contains only a few key physical parameters, the most important one being the so-called diffusivity parameter, i.e. the hydraulic diffusivity parameter in flow through porous media, the conductivity coefficient in conductive heat transfer, the diffusion parameter in mass transfer, etc. Although this lumped parameter is a function of microscopic properties of the system, it characterises the observed macroscopic dynamic behaviour of the system effectively. Hence, using macroscopic measurements of the system, it is possible to estimate such parameters. We shall use the family of diffusion-advection-reaction processes as a showcase of our methodology, but we remind the reader that it is applicable to higher order linear PDE processes. Furthermore, we make a particular choice of boundary conditions, but many others exist that can also be applied within our framework. However, it is important to note that we restrict attention to systems with physical parameters that are not spatially dependent.
Diffusion-advection-reaction processes are described by the following family of second-order linear PDEs: 2 where f(x, t) represents a macroscopic physical quantity at continuous time t and continuous position x. The coefficients θ 1 > 0, θ 2 , θ 3 are physical parameters. The spatial domain is defined by D = [0, L], where L is the total considered length. We assume zero initial conditions at t = 0.
The boundary conditions are The physical parameters are collected in the vector θ = (θ 1 , θ 2 , θ 3 , θ 4 ) T . The first boundary condition in (2) is of the second kind, known as the Neumann boundary condition. It expresses the flow rate across the boundary at position x = x u induced by the influx u(t). We define u(t) as the user-imposed (known) input signal to the physical system and, therefore, call x u ∈ D the input location. The second boundary condition defines the noise-free output y nf (t) being equal to the physical quantity f(x, t) at output measurement location x = x y ∈ D. We thus consider a single-input, single-output system. The third boundary condition states that the physical quantity f(x, t) at location x = L is equal to zero at all times.
Definition 2.1: The data-generating system is defined by Equations (1) and (2) and setting θ = θ 0 , where θ 0 are the true physical parameter values.
If the data-generating system is analytically tractable, then a Laplace transform of Equations (1) and (2) allows us to relate the input u(t) and output y nf (t) of the datagenerating system through where s is the Laplace variable, Y nf (s) = L f (x y , t ) the Laplace transform of y nf (t), U (s) = L {u(t )} the Laplace transform of u(t), and G x u ,x y (s, θ 0 ) is defined as their transfer function. The subscripts x u , x y indicate that the transfer function depends on the input and output locations. Hence, the above relation shows that the physical system may be interpreted as a linear, time-invariant system defined through input U(s), output Y nf (s) and transfer function G x u ,x y (s, θ 0 ). As mentioned in Section 1, this transfer function will be irrational and of infinite order in s.

Non-dimensionalisation
An inherent feature of physical systems is the order-ofmagnitude difference between the input and output variables and the physical parameters. Numerical simulation of the unscaled system (1)-(2) is prone to numerical difficulties, especially when considering the optimal experiment design algorithm, which uses a covariance matrix expression of the parameters. Without scaling, this matrix is usually ill-conditioned and consequently the algorithm cannot usually find a solution to the optimisation problem.
To avoid these difficulties, we non-dimensionalise Equations (1) and (2) as follows: (1) Scale parameter vector θ:θ = −1 s θ, where s = diag(θ s,1 , . . . , θ s,4 ) is a diagonal matrix containing the scaling factors for each element θ i in the vector θ, (2) Non-dimensionalise all variables: where f s , u s , x s , and t s are as-of-yet undecided scaling values, (3) Rewrite Equations (1) and (2) in terms of the nondimensional parameters and variables defined in steps (1) and (2): with boundary conditions (4) Select f s , u s , x s and t s such that as many as possible terms in Equations (5) and (6) are solely a function ofθ, and therefore of O(1). The selection is not unique. One possible choice is to freely choose f s and determine x s , t s and u s as the solution of the following three equations: t s = x 2 s /θ s,1 , x s = θ s, 2 t s and u s = θ s, 4 f s /x s . This leads to x s = θ s, 1 /θ s, 2 , t s = θ s,1 /θ 2 s,2 and u s = θ s, 4 θ s, 2 f s /θ s, 1 . If someθ i are zero, more freedom is available.
Step (1) ensures that the dimensionless parameters are O(1) (i.e. have a value in between [0, 1]), a necessary step in order to apply experiment design. Although we do not know the actual values of θ 0 , we do know their order of magnitude. Consequently, each element in the scaled vector θ can be made of order one.
Step (4) simplifies the non-dimensional system and shows which processes (i.e. diffusion, advection or reaction) are dominant. 3 Substitution of θ = θ 0 in step (1) and following the scaling procedure then defines the scaled equivalent of the data − generating system as detailed in Definition 2.1.
The relation between the scaled outputỹ nf and scaled inputũ(t ) for the family of scaled physical systems reads where nowỸ nf (s) = L{f (x y ,t )},Ũ (s) = L ũ(t ) and s = s t s . This equation is the scaled equivalent of Equation (3) forθ =θ 0 ≡ −1 s θ 0 .

Data generation
In the previous sections, we have defined the continuoustime data − generating system (see Definition 2.1). This system represents the true physical process of which we want to identify the physical parameters θ 0 . To accomplish this, we apply a continuous time (analogue) input signal u(t) to the data-generating system leading to the continuous noise-free output y nf (t). This output is measured with a sampling time T s and corrupted by zeromean white noise with variance σ 2 e . The measurements are thus given by We will generally consider sinusoidal input signals (and thus sinusoidal output signals). The sampling time is generally chosen in such a way that the Nyquist frequency π/T s is a decade above all dynamics of the system (i.e. the system's bandwidth). We shall denote Z N = {u D [n], y D [n]} n = 1, … , N as the set containing the sampled input and output data. The values u D will not always be used in the identification procedure.

Identification procedure
We identify the physical parameter vectors θ 0 using the collected data. First, we scale the data-set Z N using Equa- , where now our data points are shifted in time tot = jT s /t s . The scaled continuoustime noise-free outputỹ nf (t ) is depicted in the right plot in Figure 1 in black, whereas the scaled measured data points y D [j]/f s are shown in red. Observe that due to the time scaling, the temporal distance between the data points has becomeT s = T s /t s .
The scaled true physical parameterθ 0 = −1 s θ 0 can now be estimated with the least-squares method: whereỹ sim (θ) [ j] is the sampled version of the output y nf (t of the scaled system (5) and (6) for an arbitraryθ and y D [j]/f s is the scaled measured output from the datagenerating system. The unscaled estimate can then easily be retrieved by calculatingθ N = sθN (cf Equation (4)). It is apparent from Equation (9) that we require an expression forỹ sim (θ) for estimation.
To simulate the noise-free scaled outputỹ nf (t ) (cf Equation (6)) for arbitrary values of the physical parametersθ, we can use two methods.
(1) If the input signal is chosen to be a superposition of sinusoids, its scaled form being whereω l = ω l t s andÃ l = A l /u s , then the continuous-time simulated output reads (Ljung, 1999) where α l = ∠Gx u ,x y (iω l ,θ), the transfer functioñ Gx u ,x y defined in Equation (7) and Q a positive integer. Sampling this signal with the scaled sampling time T s /t s generatesỹ sim (θ)[ j] that is used in Equation (9).
(2) Discretise Equations (5) and (6) using a finitedifference method which is detailed in Section 3.2.1. This method discretises time and space at an interval of t and x, respectively. The constant t is called the time integration step. Let us choose t =T s = T s /t s . Then, we can apply an arbitrary input signal u(t) to the data-generating system. The sampled scaled inputũ D fromZ N can then be used to simulate the outputỹ sim (θ) [ j] of which the samples are separated at an interval of T s /t s . The simulated points are shown in blue for θ = −1 s θ 0 and as can be observed, they occur at the same time instance as the scaled measured output data fromZ N . Method (1) can be only be used for sinusoidal input signals and if a closed-form expression ofGx u ,x y exists. Method (2) is the most generic one as it can be used when theGx u ,x y does not have a closed-form expression and/or the input signal is not a sum of sinusoids. From now on, we will consider method (2). We now show how to generateỹ sim (θ) [ j] for this method.

.. Simulation of the data-generating system using a finite-difference scheme
In this section, we show how we generate the data points ofỹ sim (θ) [ j] in Equation (9). To this end, we discretise the scaled PDE equations (5) and (6), which will serve two purposes. On the one hand, it provides us a way to generateỹ sim . On the other hand, the discretisation delivers a state-space model explicit in the physical parameters, which in turn can be converted into a discrete-time transfer function that is required for optimal experiment design. PDEs like (5) and (6) are sometimes referred to as stiff equations. Applying the wrong integration scheme can result in exponential growth of numerical errors. Most explicit methods, such as the forward Euler method, will only provide a stable solution under restrictive conditions on the spatial and temporal integration steps. To avoid such issues, we have adopted the implicit Crank-Nicolson algorithm, which is known to be unconditionally stable regardless of the temporal and spatial integration steps. A second benefit of this method is that the temporal truncation error is of ( t) 2 instead of t for the Euler methods.
We recall that we will simulate a scaled version of the data-generating system in Definition 2.1. The conversion between the continuous-time physical models (1)- (2) and (5)-(6) is defined through Equation (4). Using these definitions, the scaled spatial domain becomesD = [0, L x s ], which we discretise in M parts of size x. This results in a spatial resolution of x = L x s M . The time integration step is chosen equal to t = T s t s 4 . The scaled time is then represented byt = j t , where j ∈ N + . We use index i ∈ N + to denote the position on the lattice, i.e. x i = i x. At location i x and at time j t , the input and macroscopic field areũ respectively. Using these definitions, the scaling steps in Section 2.1, discretisation of Equations (5) and (6) where on the right-hand side (rhs) of the first equation we took the average of a forward and backward Euler methods to ensure stability of the simulation for any t and x (it will also ensure stability of the transfer function G that we will derive shortly). Lastly, t s is defined in step (4) in Section 2.1. This discretisation method is known as the Crank-Nicolson method. We remark that the actuator and sensor positionsx u andx y in Equation (6) determine the resolution x to ensure that i u =x u / x, i y =x y / x in Equation (13) are integers.
We rewrite the first of the above equations as where theλ's are defined as where we recall that t s is defined in step (4) in Section 2.1. With these expressions, we will now show how to approximateỹ nf (t ) at discrete-time instancest = jT s /t s using the input we applied to the data-generating systemũ j i u . To this end, let us denote the vectorf [ j Imposing the boundary conditions (13) and grouping all terms of j + 1 in Equation (14) on the left-hand side (lhs) and all terms at time j at the rhs results in the descriptor state-space form: in which MatricesẼ andÃ are two (M + 1) × (M + 1) matrices, B a (M + 1) column vector andC a (M + 1) row vector. Furthermore, In the expressions of w 0, i , w 2, i and w 3, i the symbol δ i,i u is the Kronecker delta function, defined by δ kl = 1 if k = l and δ d kl = 0 for k ࣔ l. It means that the values of w 0, i , w 2, i and w 3, i differ at the row index i = i u , which is a consequence of the boundary conditions. We remark that the matricesẼ andÃ are tri-diagonal since we are dealing with a second-order PDE (cf Equation (1)). Consequently, we can compute the vectorf [ j] with O(M) complexity with Thomas' algorithm (Thomas, 1949) (a simplified version of Gaussian elimination that can solve tri-diagonal systems of equations).
We now return to the problem of identification in Section 3. To simulate the scaled continuous-time output y nf (t ) defined in Equation (6), we first compute the vector f [ j] using Equations (16)-(21) and the scaled input data points fromZ N defined in Section 3, i.e. we compute the macroscopic field f(x, t) at the discrete spatial locations i x for i = 0, … , M , and times j t = jT s /t s for j = 1, … , N. Equation (17) then takes the element of this vector cor- We can now identify the physical parameters using Equation (9), the scaled outputs fromZ N and our simulated outputỹ sim [ j]. This way of simulating generates the entire macroscopic field f (x, t ) at discrete positions i x. The computational time scales linearly with t due to the tri-diagonal algorithm, which allows for very high spatial resolution in the leastsquares identification method.
Lastly, we show how the descriptor state-space form can be converted into a discrete-time transfer function. First, we rewrite system (16)-(17) in its state-space form: Here,z = e iωT s . From this state-space form, we can trivially compute the discrete-time transfer function (the discrete-time equivalent of Equation (7)), being In this equation, I is the (M + 1) × (M + 1) identity matrix,z = e iωT s andω = ωt s the scaled frequency. Note that this transfer function is not causal. However, since we only need its frequency response later, and u(t) is fully known, this is not an issue. The discrete-time scaled input and output signals are then related bỹ We recapitulate what we have done so far. We have defined the data-generating system to which we apply an analogue input signal, usually a superposition of sinusoids, and measure the noise-corrupted output at an interval of T s seconds. We have shown how to identify the physical parameters by scaling the data-set Z N and simulating the scaled continuous-time model of the datagenerating system as defined in Definition 2.1. What we have not yet defined is how to design the input signal that minimises the cost of the experiment while guaranteeing user-imposed constraints on the variances of the physical parameters. This question will be addressed in the next section.

Least costly optimal experiment design
We recall that we wish to estimate the true, κ-dimensional, parameter vector θ 0 of the datagenerating system (see Definition 2.1) in such a way that the cost of the experiment is minimal, while at the same time guaranteeing with high probability that the variances of the elements of our scaled estimateθ N remain below certain user-defined constraints. The cost and the constraints need to be a function of the to-be-designed input signal in order to find the optimal one. We first assume that the sensor and actuator locations i y and i u are fixed. In all that follows, we consider the scaled system, but conversion to the unscaled system is done with Equation (4). We restrict our attention to a multi-sine input signal.

Fixed sensor and actuator locations
We start by defining the constraints. The joint confidence region containing an estimateθ N with a user-defined probability α is described by the ellipsoid in which χ 2 α (κ ) is the quantile of the chi-squared distribution function for a probability α, κ = dim(θ). Furthermore, the inverse of the covariance matrix P −1 θ when using the scaled input signalũ D [n] with spectrum˜ ũ in the frequency domain reads (Ljung, 1999) whereσ 2 e = σ 2 e / f 2 s is the scaled noise variance, G i u ,i y (e iωT s ,θ) the discrete-time transfer function defined in Equation (23),T s the scaled sampling time and˜ ũ (ω) the spectrum of input signal.
We now wish to limit the size of this ellipsoid by containing it in a κ-dimensional box defined by the user-defined constraints ∀i ∈ [1, . . . , κ] : [− θ i + θ 0,i , θ i +θ 0,i ] to ensure a particular accuracy. These constraints translate into a required variance of the estimates. Let the variance of element i inθ N,i beσ 2 i = e T iPθ e i , where e i is the ith unit vector, then the constraints can be written as The second component to formulate the least costly experiment design problem is to define the cost of the experiment. We define the scaled cost of the experiment, denotedJ cost , as the power of the as-of-yet undetermined scaled input signal: The least costly experiment design problem is thus formulated as subject to the constraints (27): ∀i : Observe that we have rewritten constraint (27) into (30) by invoking Schur's complement in order to ensure that all constraints are linear in the spectrum˜ ũ , i.e. we have linear matrix inequalities (LMIs). Consequently, since the cost is also linear in the spectrum, the optimisation problem (29)-(30) is linear in the design variable˜ ũ . The optimisation problem is thus convex. Its solution, denoted ũ,opt , is the spectrum that minimises the cost while honouring the constraints. In order to solve this problem numerically, we have to parameterise the input spectrum ũ (ω). To this end, we discretise the frequency domain into Q ∈ N + parts. Definingω f = π QT s as the fundamental frequency, we have thatω l = lω f , for l = 1, … , Q ex , where Q ex ࣘ Q is the number of sinusoids used in experiment design. 5 Furthermore, we choose the spectrum to be of the following form: corresponding to a QT s /2-periodic discrete-time multi-sineũ Substitution of Equation (31) into the cost (29) and the expression of the covariance matrix (26) gives The above two equations show that the cost and covariance matrix are now linear in the amplitudesÃ 2 l . Substitution into the optimisation problems (29) and (30) then yields a convex finite-dimensional problem inÃ 2 l . The integer Q determines the accuracy of the solution.
In Appendix 3, we show how to compute the gradient ∂G i u ,i y /∂θ efficiently. The solution to the optimisation problem generates the set {Ã l,opt } l=1,...,Q ex . Substitution of these amplitudes in Equation (32) then delivers the scaled optimal input signalũ opt [ j]. The unscaled optimal input signal is easily retrieved via Equation (4), i.e.
A l,opt =Ã l,opt u s and ω l =ω l /t s . The resulting unscaled signal is the analogue equivalent of Equation (32).
We finish this part with some remarks. First, observe that the constraint (27) and the inverse of the covariance matrix (26) depend on the unknown true parameter valuesθ 0 . Stated otherwise: to design the optimal input signal that identifies the parameters we already need to know them. This so-called chicken-andegg problem is in practice circumvented by replacingθ 0 in these equations by an initial guess,θ g . Admittedly, this will by definition result in different experiment costs and different parameter variances. Nevertheless, under equal input power, the framework can deliver signals that result in variances of the parameters that are lower than obtained with an arbitrary input signal. An illustration will follow in Section 5.
Second, we mentioned in the introduction that scaling is of importance in the least costly experiment design framework for physical systems. We comment further on this now. Suppose we want to identify two physical parameters, denoted θ 1 and θ 2 . Their values can easily differ by 10 orders of magnitude, resulting in variances (that are on the diagonal of P θ ) that differ by 20 orders of magnitude. Consequently, the matrix P −1 θ is ill-posed, and the convex methods can no longer solve such problems. However, with scaling, the parameters are of the same order, resolving the badly scaled matrix.
Third, the LMI optimisation problem can be solved in polynomial time. The number of decision variables Q ex is not big, as well as the number of to-be-identified parameters. This leads to rather quick solutions.
Lastly, we mention that in the numerical procedure we require an expression for ∂G i u ,i y (z,θ 0 )/∂θ. The transfer functionG i u ,i y (z,θ 0 ) being of high order for fine spatial grids, this can lead to a heavy computational load. Note that nevertheless, as previously mentioned, this load can be eased by the method described in Appendix 3. Moreover, it is also to be noted that this gradient computation can be achieved prior to the resolution of the LMI problem. Finally, this load becomes negligible if an explicit continuous-time expression forG i u ,i y (iω,θ 0 ) exists. Indeed, the discrete-time transfer function in Equation (26) can then be replaced by its much simpler continuous-time equivalent. We give an example in Section 5.

Actuator and sensor locations as design variables
In the previous section, we formulated the least-costly experiment design (LCED) framework but assumed that the actuator and sensor locations i u and i y were given.
Since the derivatives ofG i u ,i y (23) depend explicitly on the actuator and sensor locations, we can also attempt to decrease the cost even further by optimally choosing these locations. Due to the explicit dependence of the derivatives on the locations, the optimal frequencies change with the locations. Consequently, we have to solve the LCED optimisation problem formulated in the previous section for many combinations of i u and i y .
Set N sim as total number of iterations; Set α and θ i 's to set constraints; Set Q determining the lowest frequency ω f = π QT s ; Set arrayÃ opt = [Ã l ] l=1,...,Q ex ; Set arrayx opt = [x u ,x y ]; Set costJ opt = 1 × 10 8 (a high value); (29) and (30)  x u,sub = x u ,x y,sub = x y ; A opt =Ã; J opt =J cost ; end end end x opt = [x u,sub , x y,sub ]; k = k + 1; end Algorithm 1: Progressive subdivision algorithm that finds the minimal experiment cost by designing the optimal input spectrum and optimal sensor and actuator locations.
The solution of this algorithm is given by the set of values {x opt ,Ã opt ,J opt } containing the optimal actuator and sensor location, as well as the optimal amplitudes and the optimal cost. Conversion to the unscaled signal is described in the previous section.
The algorithm makes use of progressive subdivision. The algorithm starts by dividing the (x u , x y )-plane into four equally sized squares. For each square, the optimisation problem is solved at the coordinates that correspond to the centre of the square. From these four solutions the one that delivers the smallest cost J cost is selected. At step k + 1, that square is subsequently divided into four equally sized squares for which we again find the least costly solution. This procedure is repeated until N sim divisions have taken place. This subdivision algorithm is important if the number of variables such asx u andx y increases. The algorithm is easily adapted if only one spatial degree is considered (only input or output location).
The algorithm speed can be improved drastically in cases where dim(θ) ≤ 2, for which we derived analytical solutions in Potters, Forgione, Bombois, and van den Hof (2015) and a novel analytical solution in Appendix 2. A properly chosen frequency grid can further improve the speed.

.. Example: estimating the dispersion coefficient and reaction rate in a tracer experiment
As the first numerical illustration of our result, let us consider a 1D river containing a homogeneous fluid (Didierjean, Maillet, & Moyne, 2004). For the sake of brevity, we will directly consider the scaled system. The river is modelled as an infinite medium with a constant flow speed v 0 = 3. Following Didierjean et al. (2004), we consider the following experiment: we inject a tracer with a ratẽ q(x,t ) in the river (on, for example, a boat) at a stationary positionx =x u and measure the tracer concentratioñ c(x,t ) (with, for example, another boat) downstream at positionx =x y ≥x u . The inputũ(t ) =q(x u ,t ) and the outputỹ(t ) =c(x y ,t ).
The dynamics are governed by Equation (5) where the profilef (x,t ) is here the concentrationc(x,t ), and parameterθ 1 is the so-called dispersion coefficientη,θ 2 the flow speedṽ andθ 3 the reaction rateξ . The boundary conditions involves the tracer injection rateq(x u ,t ) and are a bit different than in Equation (6) (see Didierjean et al., 2004). Among the three parameters, the flow speedṽ is generally not identified from data, but measured. Consequently, our aim will be to identify the two remaining parameters: the dispersion coefficient and the reaction rate of the tracer. Note that the dispersion coefficient represents the combined effect of molecular diffusion and hydrodynamic transport.
As shown in Didierjean et al. (2004), one can deduce a closed form of the continuous-time transfer function between the input and output. After scaling, we obtaiñ wherel =x y −x u is the distance between the measurement and actuator location, and C(s,θ) =s k sinh kl , D(s,θ) = cosh kl +ṽ 2kη sinh kl , (37) Observe that the transfer function is only a function of the relative distance of the actuator to the sensor. This makes sense, as it should not matter where we inject the tracer in an infinitely long river. Let us define the data-generating system with the parameter vectorθ = [η 0 ,ξ 0 ] = [1.0, −0.1] and assume thatṽ 0 = 3. Our objective is to estimateθ 0 . The scaled measurement noise is assumed to beσ 2 e = 0.01; the experiment length is set to N = 9000 samples. We wish to identifyθ with the least powerful input signal, yet ensuring that the parameter variance constraintsσ 2 η ≤ ( 0.01 3 ) 2 andσ 2 ξ ≤( 0.02 3 ) 2 are satisfied. To this end, we also optimise the distancel between the actuator and the sensor.
The distancel is a scalar variable, hence Algorithm 1 need not be used to determine the optimal distance. We instead solve the optimisation problems (28)-(30) for the valuesl = 0, 0.05, 0.1, . . . , 1.5. Since only two parameters are identified, we can use the analytical solution in Appendix 2 to solve the problem for all these values ofl. However, we could of course also have used the LMI optimisation.
Using this procedure, the least powerful excitation signal is found for a distancel opt = 1.0. For this value, the least powerful excitation signal is given bỹ whereÃ opt = 7.54,ω opt = 0.31. The optimal (scaled) amplitude is thus equal to 7.34 forl opt . Forl = 0, the amplitude that is required to fulfil the variance constraints is 15.16, i.e. twice as much. For distances exceedingl opt , the amplitude also increases significantly. This shows the advantage of optimising the distance between the sensor and actuator in this experiment. In the next section, we consider a more detailed example and compare our results with existing ones in the literature.

Case study: estimation of diffusivity and conductivity parameters in front-face experiments
In this section, we apply the optimal experiment design framework to the identification of thermal parameters with a front-face experiment. The experimental set-up is inspired by work of Gabano and Poinot (2009) and simulated with the computer. We first introduce the datagenerating system and its scaled equivalent in Section 5.1. We then set up the experiment and define the constraints on the variances of the estimates and compute the optimal input signal in Section 5.2. We solve the optimisation problem using CVX (Grant & Boyd, 2013). We also show what the optimal actuator and sensor locations are. In Section 5.3, the optimal input signal is applied to the data-generating system (in a simulation environment) and with the collected data we identify the physical parameters. In order to test if the variance constraints are honoured, we simulate 2 × 10 4 experiments. We also analyse what happens to the optimal input signal when we replaceθ 0 by an initial guessθ g in Equation (34).

Data-generating system
We consider a homogeneous rod of length L = 0.05m oriented along the spatial coordinate x (see Figure 2). We place the left side of the rod at x = 0, such that the spatial domain we consider is D = [0, L]. During the experiment, we heat the cross-sectional area of the rod uniformly at x = x u with a heat flux u(t) and keep the temperature constant at the right boundary (x = L), here equal to zero. 6 We measure the temperature T(x, t) at position x = x y ∈ D. The optimal actuator and sensor positions are determined with optimal experiment design. We assume zero initial conditions. The dynamics are governed by the following equations: in which λ 0 = 111 Wm −1°C−1 is the thermal conductivity and α 0 = 3.38 × 10 −5 m 2 s −1 the thermal diffusivity. We collect the physical parameters in the vector θ 0 = [α 0 , λ 0 ]. This data-generating system corresponds to the continuous-time second-order PDEs (1) and (2) for the macroscopic field f(x, t) = T(x, t), input location x = x u , θ 1, 0 = α 0 and θ 4, 0 = λ 0 . Our goal is to identify the physical parameters θ 0 = [α 0 , λ 0 ] = [3.38 × 10 −5 , 111].

.. Non-dimensionalisation
Following Section 2.1, we introduce the non-dimensional variablesT (x,t ) = T (x,t ) y s ,x = x x s ,ũ(t ) = u(t ) u s ,t = t t s , and non-dimensional parametersα 0 = α 0 αs ,λ 0 = λ 0 λ s . Choosing y s = 1, x s = L = 0.05, t s = L 2 α 0 = 73.96, u s = L λ 0 y s = 4.5 × 10 −4 , α s = α 0 , λ s = λ 0 , and substituting the nondimensional variables in Equations (40) and (41) results in the non-dimensional model, Note that we have used an initial guess θ g = θ 0 for convenience. This results in an unscaled true parameter vector, θ 0 , that is of O(1). The non-dimensional continuous-time transfer func-tionGx u ,x y (s,θ 0 ) that couplesũ(t ) to the outputỹ nf (t ) (cf Equation (7)) is derived in Appendix 1 and reads as In this equation, the Laplace variable has also been scaled according to Equation (4), i.e.s = t s s = sL 2 /α 0 . We will use this transfer function in the experiment design procedure that is explained in the next section.

Experiment preliminaries
In this section, we define the experiment. We choose the same parameters as in Gabano and Poinot (2009), mainly to compare the excitation frequencies. We remark that we did not have an actual physical set-up to generate data. Instead, the noise-corrupted output data is generated with the computer. We set the experiment length at N = 2000 + 9000 samples, where the first 2000 samples are not used in the identification, i.e. we wait until transients died out. The sampling time is set at T s = 0.1 s. The optimal input signal (which we will compute shortly) generates the measured output of the data-generating system given by where we assumed that the output of the data-generating system y nf (nT s ) is corrupted by zero-mean Gaussian white noise with variance σ 2 e = 0.05 (see also Equation (8)).

.. Optimal experiment design
In Gabano and Poinot (2009), the collection of a thousand estimates α N and λ N were distributed around their respective true values α 0 = 3.38 × 10 −5 and λ 0 = 111 (identical to the parameters used in this section) with σ α = 0.02α 0 /3 and σ λ = 0.01λ 0 /3. Following Section 4, we cast these values in the scaled variance constraints where it is understood that the probability α 0.99, and that χ 2 0.99 (2) ≈ 9. The optimal experiment design problem is formulated by Equations (29) and (30). Choosing Equation (31) to represent the scaled spectrum, the above constraints, the scaled optimisation problem for this experiment reads The expression ofP −1 θ is given by Equation (34), in which we substitutedG i u ,i y (z,θ) with the continuoustime transfer functionGx u ,x y (s,θ) in Equation (44), and used as initial guess θ g = θ 0 , i.e.θ 0 = [1.0, 1.0].
We use Algorithm 1 to find the optimal locations and the optimal input signal for those locations. To this end, we thus solve the problem (29)-(30) with the LMI approach. We take Q = Q ex = 200. For each combination (x u ,x y ), it turns out that the optimal input signal is a single sinusoid. Interestingly, we find that the lowest cost, i.e. J cost = 1 2Ã 2 opt is obtained at (x u ,x y ) = (0, 0.12). The optimal amplitudeÃ opt atx u = 0 as a function ofx y ) is depicted in Figure 3(a). In unscaled length, this corresponds to x y = 0.12L. In practice, front-face experiments (x u = x y = 0) are common. However, this study suggests that this is not the best practice, as a lower cost (proportional tõ A 2 ) of about 6% can be obtained atx y = 0.12. Equivalently, for the same cost, the variances in the parameters will be about 6% lower sinceP −1 θ is proportional tõ A 2 . Furthermore, observe that the curve increases rapidly asx y increases. Although not shown in the figure, wheñ x y → 1, the optimal amplitudeÃ opt → ∞. This is a consequence of the boundary conditionT (1,t ) = 0 ∀t (cf Equation (43)). Hence, the informativeness of the data at any frequency is zero at this location.
Although the best estimate can be obtained at (x u ,x y ) = (0, 0.12), we shall, however, use the optimal input signal forx u = 0 and sensor locationx y = 0 to compare with previous works. In this case, the optimal input signal is computed to bẽ u opt (t ) = 1.7067 sin(1.5666t ), which in unscaled variables translates (using the conversion defined in Section 5.1) into the optimal input signal 7ũ D (t ) = 3.789 × 10 3 sin(0.0212t ).
Observe from Figure 3(b) that the scaled optimal frequencyω = 1.5666 lies in between the two maxima of the derivatives ofGx u ,x y . This is an intuitively pleasing result, as high values for the derivatives lead to a large certainty (see Equation (34)). We refer the reader to Appendix 2 for a more detailed analysis and interpretation of the optimal input signal.

.. Chicken-and-egg problem
The optimal input signal designed in the previous section was designed by using the true parameter vectorθ 0 . In practice, however, we obviously do not know this vector as we in fact want to estimate it. As mentioned in Section 4, the problem of finding the optimal signal to identify the parameter vector requires the parameter vector itself. This so-called chicken-and-egg problem can be circumvented by replacingθ 0 in Equation (34) with a previous estimate or guessθ g . This inevitably leads to a designed input signal that is not optimal. Optimal input design can, however, still be used and will generally lead to better estimates than arbitrary signals under the same experiment cost. A central question is the sensitivity of the cost of the experiment to the initial guessθ g , and whether or not the constraints will still be honoured. To this end we computed the optimal amplitude and frequency for many values ofθ g using Equations (47) and (48). The range in which these values lie is larger than the desired accuracy of estimates from the identification experiment. In Figure 4(a), the optimal amplitude is shown for values ofλ andα around 10% ofλ 0 = 1 andα 0 = 1 (the case for which θ g = θ 0 ). Observe that within this range the optimal amplitudes can differ up to 30% from the one obtained withθ 0 =θ g , i.e. A opt = 1.7067 (cf Equation (49)). The cost of the experiment is thus rather sensitive to the guessθ g . In Figure 4(b), the optimal frequencies as a function ofθ g are shown. It can be observed that the optimal frequency is not sensitive to a wrong guessα g for a given guessλ g .
To test whether the constraints will still be honoured, and how large the error in the estimates is when using the optimal input signal designed with θ g = θ 0 , we proceed as follows. For each guessθ g , we use the corresponding optimal amplitude and frequency of Figure 4(a) and 4(b) and apply this input signal to the true system. We then obtain the variances in the estimatesα N andλ N as a function of θ g . We use the following measure of error: We found that the relative error in the considered interval lies between 0% and 30%. Also, it is clear that a strong correlation exists with Figure 4(a): if the optimal amplitude is larger or equal toÃ opt = 1.7067 ((α,λ) = (1, 1)), we obtain variances that are smaller or equal to the case θ g = θ 0 . Conversely, we do not satisfy the constraints if the optimal amplitude is smaller thanÃ opt = 1.7067.

Identification results
In this section, we identify the physical parametersθ 0 with the optimal input signal computed in the previous section (49). We use method (2) as detailed in Section 3. The resulting unscaled data-set Z N is defined by scaling (45) and the scaled sampled equivalent of the input (50). We remind the reader that we consider the case x u = x y = 0. The data-setZ N = u D [ j] u s , y D [ j] y s j=2001,...,N , where u s and y s are defined in Section 5.1. Note that we discarded the first 2000 samples to remove transients. Simulation of the scaled noise-free output y nf (t ) (43) is done according to Section 3.2.1 where we chose t = T s /t s and M = 200. The simulated noise-free outputỹ sim [ j] for j = 2001, … , N is then used together with the scaled measured data y D [j]/y s iñ Z N in the least-squares procedure (9). For one experiment we found the scaled estimates resulting from this procedure to beα N = 1.01 andλ N = 1.005, corresponding to unscaled estimatesα N = 3.414 × 10 −5 m 2 s −1 andλ N = 111.56 Wm −1°C−1 . These estimates fall within the respective intervals [λ 0 − 0.01λ 0 , λ 0 + 0.01λ 0 ] and [α 0 − 0.02α 0 , α 0 + 0.02α 0 ] that we set in Section 5.2.

.. Monte Carlo simulations experiment : validating the variance constraints
To validate whether the variance constraints are honoured, we ran 2 × 10 4 Monte Carlo simulations to identify the scaled physical parametersα 0 = 1 andλ 0 = 1 with the optimal signal (49). (In other words, 20,000 data-sets Z N were generated and for each the identification procedure was applied.) The identified parametersα N and λ N for all experiments are shown in Figure 5. The mean value of the coordinateθ N = [α 0 ,λ 0 ] is indicated by the red cross. The constraints set in Section 5.2 are visualised by the square resulting from intersection horizontal and vertical dashed black lines.
Observe that almost none of the estimatesθ N lie outside the region of constraints. The computed variance for α N andλ N are, respectively, var(α N ) = 3.615 × 10 −5 and var(λ N ) = 1.1108 × 10 −5 . Clearly, the optimal input signal designed in the previous section honours the constraints. The experiment design procedure ensured that the confidence ellipse 'touches' the horizontal constraints, whereas the variance inα N is in fact a bit smaller. This is not surprising, as explained in Appendix 2.
Slightly lower accuracy in the estimates is obtained in Gabano and Poinot (2009) for the same experiment length N, parameters α 0 andλ 0 , and rod length L. The input signal they considered was a pseudo-random binary excitation signal with a power distribution in the higher frequencies, up to 20 rads −1 . The amplitudes are of the same order but the noise variances might be different, so a comparison is difficult. However, independent of this difference, our result suggests that one should instead use a very low excitation frequency, i.e. 0.02 rad s −1 to get the most accurate estimates. As shown in Figure 4(b), choosing high frequencies leads to matrix P −1 θ that is much smaller than using one that is close to the maxima of the derivatives. Intuitively, it means that the system is highly insensitive to high-frequency input signals.
Our results also suggest that higher accuracy can be obtained by measuring atx y = 0.12 as shown in Figure 4(a). The ratio of the optimal amplitude betweeñ x y =0 andx y = 0.12 is 1.03. AsP −1 θ is proportional toÃ 2 , it means that 1.03 2 higher accuracies can be obtained using the same input power.

.. Monte Carlo simulations experiment : chicken-and-egg problem revisited
In Section 4, we mentioned that OED suffers from the chicken-and-egg problem. In this section, we show that we can still find estimates that honour the user-imposed constraints, even if we do not know exactlyθ 0 .
To this end, suppose that we start without any prior knowledge on θ 0 . We run an experiment of length N/2 and apply a white-noise input signal 8 with high variancẽ σ 2 wn =25. This delivers us an estimateθ wn . At this point, we compare two scenarios: (1) we apply optimal experiment design to find the optimal input signal that guarantees the constraints (46) based on the initial guessθ g =θ wn for an experiment length of N/2, or (2) continue with applying a white-noise signal that has the same power as the optimal input signal and equal experiment length. Both scenarios thus have equally powered input signals and the experiments have equal length.
Using Monte Carlo simulations, we first generate 500 white-noise realisation with varianceσ 2 wn =25 that generate 500 estimatesθ wn , which we collect in the set {θ wn }. These estimates are shown in red in Figure 6. Next, for each of the estimates, we run scenarios (1) and (2) Figure . Monte Carlo simulations. The red, blue and green circles correspond to the set of initial guesses {θ wn }, estimates generated by optimal input signals and estimates from white-noise input signals with a power that is equal to their respective optimal input signal.
Since both scenarios generate signals that are equally long and equally powered, these Monte Carlo simulations clearly illustrate the advantage of optimal experiment design.
The above approach is the classical approach to tackle the chicken-and-egg problem. More involved approaches exist. In these approaches, the initial guess and the optimal spectrum are adapted throughout the experiment (see e.g. Forgione, Bombois, & Van den Hof, 2013;Gerencsèr, Hjalmarsson, & Mårtensson, 2009;Larsson et al., 2013).

Conclusions
The main novelty of this paper is the introduction of a systematic way to identify physical parameters in linear physical systems in a least intrusive manner while guaranteeing accuracy on the to-be-identified parameters. We have in particular shown how to apply the theory of least costly experiment design to diffusion-advection processes.
To this end, we made use of a discretisation using the Crank-Nicolson stencil to truncate the continuoustime model to find a discrete-time transfer function that couples the input and output. This transfer function is then also guaranteed to be stable. This integration scheme is not only unconditionally stable, but also more accurate than the Euler stencil. The resulting truncated model is a state-space realisation that is explicit in the physical parameters. We then showed how optimal experiment design can be applied.
The second novelty of this paper is the generalisation of the experiment design framework to find not only optimal amplitudes and frequencies, but also optimal actuator and sensor locations.
We applied our methodology to the estimation of two thermal parameters in a front-face experiment. This study showed that current practice, i.e. placing the sensor and actuator at position x = 0, in fact does not deliver the best possible estimates. Our study suggests that placing the sensor location at a distance of 12% of the total length of the rod from the actuator position yields estimates that are 6% better. Applying the optimal input signals designed in the case study furthermore shows that the input power can be reduced considerably in comparison to previous experiments.
It is interesting to extend the methodology to physical systems with spatially dependent parameters. This is considered to be future work.

Notes
1. For a general introduction to optimal experiment design, we refer the reader to the nice historical review of Mehra (1974). 2. For simplicity, we considered only one spatial dimension, but more can be incorporated easily. 3. The scaling procedure explained here is classical, except that usually in the literature the parameters are (assumed to be) known. In such cases, one can make most terms equal to unity in step (4), rather than of O(1). 4. The simulation accuracy is thus O( t ) 2 . If t turns out to be large, define the integer γ such that the time integration step becomes t /γ . This generates γ times more points in the considered simulation time interval. In the identification procedure, one then has to downsample the simulated output by a factor γ . 5. Since the Nyquist frequency is chosen a decade above the system's bandwidth it is generally not necessary to cover the whole frequency range [0, π/Ts] 6. If x u = 0, we can easily heat the cross-sectional area. If x u 0, the rod can be heated locally with a thin thermal band wrapped around the rod. 7. This optimal input signal has a different optimal frequency than the casex y = 0.12. 8. Note that we now can only use identification method (2) of Section 3, because the input signal is no longer periodic, whereas in Experiment 1 we could have also opted for method (1) since the input signal is periodic and a closedform transfer function exists.
We can now combine Equations (A8) and (A9) to find that Next, we can write the expression for Y nf (s) = F(x y , s) which reads Hence, the transfer function between input and output reads which can be further simplified to Substitution of f(x, t) = T(x, t) in Equation (A1) and following the same calculations yields the equation above. Finally, setting θ 1 = α 0 and θ 2 = λ 0 results in G x u ,x y (s, θ 0 ), in which θ 0 = [α 0 , λ 0 ]. In a similar fashion, the continuous-time transfer function between the scaled input and scaled output can be derived for Equations (42) and (43), yielding This equation is equal to (44) after substitution ofθ = θ 0 = [α 0 ,λ 0 ].

Appendix 2. An analytical solution to the 2D least costly experiment design problem
This section derives the analytical solution of the optimal input signal that is found numerically in Section 5. The derivation here is not specific to this example, but is a general solution for systems of which two parameters need to be identified. In this appendix, we have dropped all tildes on variables and parameters to simplify notation. Our starting point is the problem definition introduced in Section 4. In the particular case if identifying two parameters θ 0 we require that the joint confidence region defined by the ellipse in which χ 2 α (2) is the α-quantile of the chi-squared distribution, lies inside the intervals i = 1, 2 : − θ i + θ 0,i ≤θ 0 ≤ θ i + θ 0,i .
Furthermore, we not only require these constraints to hold, but also to find the spectrum in Equation (B1) that Figure B. The blue ellipse E  corresponds to the boundary of the confidence region θ T P −1 θ θ ≤ χ 2 α (2). The dashed and dash-dotted grey line sets correspond to the constraints on parameter θ  and θ  , respectively. minimises the cost as given in Equation (29). This constraint is equivalent to the variance constraint (27).
The situation is sketched in Figure B1. The blue ellipse E 1 is described by Equation (B1), the sets of horizontal and vertical lines l 1 and l 2 indicate the intervals given in Equation (B2). The goal is now to find the spectrum u (ω) that minimises the cost, while at the same time ensuring that the ellipse E 1 lies inside the box defined by the four intersection points of the sets of lines. To keep notation ease, we translated the centre of the ellipse from θ 0 to (0, 0) without loss of generality.
We recall that we parameterise the spectrum as a multi-sine. It is known in System Identification that two parameters can be identified with a single sine. In this appendix, we give the solution to the optimisation problem defined in Section 4 assuming the solution lies in the family of single sines. The optimisation problem thus reads We now have an expression for the amplitudes A i such that ellipse E 1 touches line l * i . The optimal amplitude is thus given by and, if we denote i opt as the index that corresponds to the function A i which has the largest minimum between the functions A 1 (ω) and A 2 (ω), This is quite a remarkable result! It shows that the minimum amplitude corresponds to the situation in where the sum of amplification factors d 1 /D i, 1 and d 2 /D i, 2 is minimal. Stated differently, the lengths indicated in Figure B2 by magenta lines should be made as small as possible.

Appendix 3. Computation of the gradient ∂G i u ,i y /∂θ
In this appendix, we show how to compute the gradient ∂G i u ,i y /∂θ in Equation (34) evaluated at θ = θ 0 for given i u , i y . We start from Equation (34), which we here recall for convenience: In this equation, I is the (M + 1) × (M + 1) identity matrix, A(θ) = E −1 A, B(θ) = E −1 B(1 + z), and C = C. Here, z = e iωT s and E, A, B, and C are given by Equations (18) and (19). Making use of the identities ∂U −1 ∂x = −U −1 ∂U ∂x U −1 and ∂(UV ) ∂x = U ∂V ∂x + ∂U ∂x V , where U , V equally sized matrices and x a scalar, we find that the derivative of Equation (C1) with respect to parameter θ i reads ∂G i u ,i y (z, θ) The derivatives of A(θ) = E −1 (θ)A(θ) and B(θ) = E −1 (θ)B(θ)[1 + z] with respect to θ i are Substitution of Equations (C3) and (C4) into Equation (C2) finally gives In this equation, the derivatives can be found analytically using Equations (18) and (19). To evaluate this equation at θ = θ 0 for all parameters θ i ∈ θ, it is most efficient to follow these steps: (1) Calculate all terms independent of z in Equation (C5) for θ = θ 0 once and store these. (2) Evaluate for θ i the expression (C5) at frequency ω by substituting z = exp (iωT s ) (3) Repeat step (2) for all other i = 1, . . . , dim(θ). (4) Repeat steps (2) and (3) until the gradient is computed for all required frequencies ω. Combining the derivatives of each element in the parameter vector then gives the gradient.