High order concentrated matrix-exponential distributions

Abstract This paper presents matrix-exponential (ME) distributions, whose squared coefficient of variation (SCV) is very low. Currently, there is no symbolic construction available to obtain the most concentrated ME distributions, and the numerical optimization-based approaches to construct them have many pitfalls. We present a numerical optimization-based procedure which avoids numerical issues.


Introduction
Highly concentrated matrix-exponential functions play an important role in many research fields, for example, they turned out to be essential for numerical inverse Laplace transform methods as well. [8] The least varying phase type (PH) distribution of order N is known to be the Erlang distribution [1] with SCV¼1=N (defined as l 0 l 2 l 2 1 À1, where l i , i 2 f0, 1, 2g, are the moments of the distribution). Much less is known about the least varying ME distribution for order N. It is known that for order 2 the class of ME distributions is identical to the class of PH distributions, and it is also known that there exists an order 3 ME distribution with SCV ¼ 0:200902<1=3, but it is still only a conjecture that this is the least varying order 3 ME distribution. Concentrated ME distributions are provided in Eltet} o et al. [4] up to order 17 and in Horv ath et al. [7] up to order 47. These preliminary results indicate that, as N increases, the minimal SCV of order N ME distributions tends to be less than 2=N 2 for odd N, and for even N, the minimal SCV is close to the minimal SCV of order N -1 ME distributions. In this paper, we propose numerical procedures by which much higher odd order concentrated ME distributions can be computed, and based on that, we refine the dependence of the minimal SCV on the order. The rest of the paper is organized as follows. In Section 2, we provide the basic definition of the considered set of functions and the SCV. Section 3 presents an algorithm for efficient SCV computation of exponential cosinesquare functions, while Section 4 introduces the numerical optimization procedure to obtain ME functions with low SCV. A heuristic approach with only 3 parameters is proposed in Section 5 and Section 6 concludes the paper.

Concentrated ME distributions
In this paper, we focus on real functions and a distribution on R þ : Definition 2.1. Order N ME functions (referred to as ME(N)) are given by where a is a real row vector of size N, A is a real matrix of size N Â N and 1 is the column vector of ones of size N, a is such that a1>0 and f ðtÞ ! 0 for t ! 0: According to (1), vector a and matrix A define the ME function. We refer to the pair ða, AÞ as a matrix representation in the sequel. The matrix representation is not unique. Applying a similarity transformation with a nonsingular matrix T, for which T1 ¼ 1 holds, we obtain vector b ¼ aT À1 Definition 2.2. The SCV of the real function f(t) is defined as where l i ¼ Ð 1 t¼0 t i f ðtÞdt for i 2 f0, 1, 2g: According to Definition 2.2, the SCV is insensitive to multiplication and scaling, because Definition 2.3. [Asmussen and Bladt [2] and Asmussen and O'Cinneide [3] ] The probability density function of an order N ME distribution is an ME(N) function with the following properties: f ðtÞ ! 0 for t ! 0 and Ð 1 t¼0 f ðtÞdt ¼ 1: If f(t) is a density function of an ME distribution, then l i denotes the ith moment and SCVðf ðtÞÞ the squared coefficient of variation (SCV) of the ME distribution. An ME distribution with density function f(t) is said to be concentrated if SCVðf ðtÞÞ is low.
Although ME functions have been used for many decades, there are still many questions open regarding their properties. Such an important question is how to decide efficiently if an ME function is non-negative for all t > 0. In general, f ðtÞ ! 0 does not hold for given ða, AÞ matrix representation for all t > 0, unless it has been constructed to be non-negative. In this paper, we are going to restrict our attention to such a special construction, the exponential-cosine square functions.
For the least varying ME(N) distributions only conjectures are available for N ! 3: [4] According to the current conjecture, for odd N, the density function of the most concentrated ME(N) distribution belongs to a special class characterized by f ðtÞ ¼ c 1 f þ ðc 2 tÞ, where c 1 and c 2 are positive constants and f þ ðtÞ is an exponential cosine-square function defined below.
Definition 2.4. The set of exponential cosine-square functions of order n has the form where x ! 0 and 0 / i <p for i 2 f1, :::, ng: Since the SCV is insensitive to multiplication and scaling, in the rest of the paper we focus on SCVðf þ ðtÞÞ with f þ ðtÞ as defined in (5) (that is, without normalization and scaling).
An exponential cosine-square function is defined by n þ 1 parameters: x and / i for i 2 f1, :::, ng: f þ ðtÞ is an ME(2n þ 1) function. Although the representation in (5), which we refer to as the cosine-square representation, is not a matrix representation, [Horv ath et al., [7] Appendix A] presents the associated matrix representation of size N ¼ 2n þ 1: Consequently, the set of exponential cosine-square functions of order n define a special subset of ME(2n þ 1) since f þ ðtÞ, by construction, is non-negative. The SCV of f þ ðtÞ is a complicated function of the parameters, whose minimum does not exhibit a closed analytic form. That is why we have resorted to the following numerical problem. For a given odd order N ¼ 2n þ 1, we are looking for efficient numerical methods for finding the x and / i (i 2 f1, :::, ng) parameters which result in a low SCVðf þ ðtÞÞ: For efficient numerical minimization of the SCV for N > 47 (i.e., n > 23) we need (i) an accurate computation of the SCV based on the parameters with low computational cost and (ii) an efficient optimization procedure with low computational cost.
In this paper we present a method that addresses i) in Section 3, and one that addresses ii) in Section 4.

Efficient computation of the squared coefficient of variation
To evaluate the objective function of the optimization problem, namely the SCV, we need efficient methods to compute l 0 , l 1 and l 2 . Deriving the l i parameters based on (5) is difficult for large N. Hence we propose to compute them based on a different representation.

The hyper-trigonometric representation
The following theorem defines the hyper-trigonometric form of the exponential cosine-square functions and provides a recursive procedure to obtain its parameters from x and / i , i 2 f1, :::, ng: for n ¼ 1: for k ¼ 0, n ! 1: for 1 k n, n ! 2 Proof. The theorem is proved by induction. For the order N ¼ 2n þ 1 cosinesquare function we introduce the notation f ðnÞ ðtÞ ¼ Q n i¼1 cos 2 xtÀ/ i 2 : For n ¼ 1, and the theorem holds. Assuming that the theorem holds for n -1, we are going to show that it holds for n as well.
We have that  The hyper-trigonometric representation makes it possible to express the Laplace transform (LT) and the l i moments for i 2 f0, 1, 2g in a simple and compact way.
Corollary 3.2. The LT and the l i , i 2 f0, 1, 2g, moments of the exponential cosine-square function, for s 2 C, are given by and Proof. Since (6) is linear, it can be Laplace transformed term-by-term using the relations Based on f Ã ðsÞ, the l i moments can be computed using the LT moment relation In order to compute the matrix representation of the exponential cosinesquare function, we introduce one more representation, and the associated transformation.

The spectral representation
The hyper-trigonometric representation makes it easy to obtain the spectral form of f þ ðtÞ as where i denotes the imaginary unit. We refer to (14) as the spectral representation because these exponential coefficients are the eigenvalues of matrix A in the matrix representation.

The matrix representation
Corollary 3.3. The f þ ðtÞ ¼ e Àt Q n i¼1 cos 2 xtÀ/ i 2 function has a size N ¼ 2n þ 1 matrix representation f þ ðtÞ ¼ ae At ðÀAÞ1, where the row vector a is composed by the elements and the matrix A is given by where D k ¼ À1 Àkx kx À1 , for k 2 f1, :::, ng: Proof. According to (18), e Àt cos ðkxtÞ Àe Àt sin ðkxtÞ e Àt sin ðkxtÞ e Àt cos ðkxtÞ : Based on (19) and (20), we can write ½ a 2k a 2kþ1 e Àt cos ðkxtÞ Àe Àt sin ðkxtÞ e Àt sinðkxtÞ e Àt cos ðkxtÞ which is identical with (6). There is one numerical issue that has to be taken care of when applying this numerical procedure with floating point arithmetic for large values of n. To evaluate the SCV, coefficients a ðnÞ k , b ðnÞ k , c ðnÞ need to be obtained from the x and / i , i 2 f1, :::, ng parameters. The recursion defined in Theorem 3.1 involves multiplications between bounded numbers (sine and cosine always fall into ½À1, þ 1), which is beneficial from the numerical stability point of view, but subtractions are unfortunately also present, leading to a loss of precision. To overcome this loss of precision, we introduce increased precision floating point arithmetic both in our Mathematica and Cþþ implementations 1 . [9,10] Mathematica can quantify the precision loss, enabling us to investigate this issue experimentally. According to Figure 1, the number of accurate decimal digits lost when evaluating the SCV from the x, / i parameters (computed by the Precision function of Mathematica), denoted by L n , is nearly linear and can be approximated by L n % 1:487 þ 0:647n: In the forthcoming numerical experiments we have set the floating point precision to L n þ 16 decimal digits to obtain an accuracy of results up to 16 decimal digits, and this precision setting eliminated all numerical issues. It is important to note that the high precision is needed only to calculate the a ðnÞ k , b ðnÞ k , c ðnÞ coefficients and the SCV itself. Representing parameters x, / i , i 2 f1, :::, ng themselves does not need extra precision, and the resulting exponential cosine-square function f þ ðtÞ can be evaluated with machine precision as well (in the range of our interest, n 1000).
A basic pseudo-code of the computation of the SCV with the indications where high precision is needed is provided by Algorithm 1.

Minimizing the squared coefficient of variation
Given the size of the representation N ¼ 2n þ 1, the f þ ðtÞ function providing the minimal SCV is obtained by minimizing (3) subject to x and / i , i 2 f1, :::, ng: The form of the SCV does not allow a symbolic solution, and its numerical optimization is challenging, too. The surface to optimize has many local optima, hence simple gradient descent procedures failed to find the global optimum and are sensitive to the initial guess.

Optimizing the parameters
In the numerical optimization of the parameters, we had success with evolutionary optimization methods, in particular with evolution strategies. The results introduced in Horv ath et al. [7] were obtained by one of the simplest evolution strategies, the Rechenberg method. [11] . In Horv ath et al., [7] it was the high computational demand of the numerical integration needed to obtain the SCV and its reduced accuracy that prevented the optimization for N > 47 (n > 23).
However, computing the SCV based on the hyper-trigonometric representation using the results of Section 3.1 allows us to evaluate the moments orders of magnitudes faster and more accurately, enabling the optimization for higher n values. With the Rechenberg method (Rechenberg [11] , also referred to as (1 þ 1)-ES in the literature) it is possible to obtain low SCV values relatively quickly for orders as high as n ¼ 125, but these values are suboptimal in the majority of cases.
With more advanced evolution strategies the optimal SCV can be approached better. Our implementation [9] supports the covariance matrix adoption evolution strategy (CMA-ES [5] ), and one of its variants, the BIPOP-CMA-ES with restarts. [6] . Starting from a random initial guess, we got very low SCV values much quicker with the CMA-ES than with the (1 þ 1)-ES with similar suboptimal minimum values (cf. Figure 2). The limit of applicability of CMA-ES is about n ¼ 180. The best solution (lowest SCV for the given order), however, was always provided by the BIPOP-CMA-ES method, although it was by far the slowest among the three methods we studied. In fact, we believe that BIPOP-CMA-ES returned the global optimum for n 2 f1, :::, 74g, and we investigate the properties of those solutions in the next sections. For n > 74, we can still compute low SCV functions with the BIPOP-CMA-ES method, but its computation time gets to be prohibitive, and we are less confident about the global minimality of the results.
For our particular problem, the running time, T, and the quality of the minimum, Q (how low the SCV is), obtained by the different optimization methods can be summarized as follows

Properties of the minimal SCV solutions
The minimal SCV values obtained by the BIPOP-CMA-ES optimization, which we conjecture to be optimal for n 2 f1, :::, 74g, are depicted in Figure 3. Apart from the minimal SCV values of the exponential cosinesquare functions, Figure 3 also plots 1=ð2n þ 1Þ and 2=ð2n þ 1Þ 2 , for comparison. The SCV ¼ 1=N ¼ 1=ð2n þ 1Þ is known to be the minimal SCV value for PH distributions of order N, [1] which form a subset in the set of order N ME distributions. [2] The 2=N 2 ¼ 2=ð2n þ 1Þ 2 curve is reported to be the approximate decay rate in Horv ath [7] , up to n ¼ 23 (N ¼ 47). Figure 3 indicates that the SCV decreases much faster than 1=N and a bit faster than 2=N 2 : Indeed, 2=N 2 is a good approximation up to n ¼ 23, but the decay seems to decrease below 2=N 2 for n > 23.
We suspect that the decrease is asymptotically polynomial (at n ! 1), that we checked by plotting the SCVn 2:14 function in Figure 4. While the exponent is determined empirically and might be slightly off, it is visible that the convergence is faster than 1=n 2 for the examined range (n < 1000).

The parameters providing the minimal SCV
In this section, we investigate the parameters corresponding to the minimal SCV and provide an intuitive explanation for the observed behavior. Figure  5 depicts the optimal x parameter as a function of n. It shows a slow decrease with some inhomogeneity around n 2 f14, 28, 43, 60g: Figure 5 suggests that x tends to 0 as n ! 0, and the inhomogeneity is related to the behavior of the / k parameters, as it is detailed below.
The visual appearance of the optimal / k parameters in Figure 6 reveals many more interesting properties. First, since the period of the cosinesquare function is p, the / k parameters in (5) are 2p periodic. That is, adding an integer times 2p to any of the / k parameters does not change the f þ ðtÞ function. In Figure 6 we transformed all / k parameters to the ðÀp, pÞ range and depicted / k þ p instead of / k because / k þ p indicates the location where the cosine-square term with / k is zero in the ð0, 2pÞ  cycle, which is the ð0, 2p=xÞ interval of f þ ðtÞ: The nth row in Figure 6 depicts n points, which are the optimal / k þ p values for k ¼ 1, :::, n: In these n points of the ð0, 2pÞ cycle f þ ðtÞ equals zero. In between these zeros f þ ðtÞ has humps. Figures 7 and 8 demonstrate the effect of / k on f þ ðtÞ for x ¼ 1, n ¼ 3, / i þ p 2 f0:1, 1, 2g for i 2 f1, 2, 3g: In the ð0, 2pÞ interval, both, f þ ðtÞ and Q n i¼1 cos 2 tÀ/ i 2 , have zeros at 0.1, 1 and 2. Figure 7 depicts f þ ðtÞ with the exponential attenuation, while Figure 8 depicts Q n i¼1 cos 2 tÀ/ i 2 without the exponential attenuation. In Figure 8 the sizes of the humps depend on the distance of the neighboring zeros. The hump between 0.1 and 1 is smaller than the one between 1 and 2, which indicates that the closer the neighboring zeros are the smaller the humps are. In Figure 7, the  exponential attenuation also affects the sizes of the humps. With the exponential attenuation the hump between 0.1 and 1 is larger than the one between 1 and 2. Additionally, the SCV is more sensitive to the humps further from the main peak, which motivates the fact that the / k þ p parameters are more concentrated around 0 to make the function as flat as possible near t ¼ 0, where the exponential attenuation is rather weak. In Figure 7, right of the main peak, f þ ðtÞ is suppressed by the exponential attenuation, while left of the peak the zeros of the cosine square terms keep the function low (c.f. Figure 8). In Figure 6, for n < 14, the zeros are located between 0 and 5, which means that in this range of n f þ ðtÞ is kept close to zero by the cosinesquare functions in the interval (0, 5). It has a peak between 5 and 2p, and the next cycles are suppressed by the exponential attenuation.
For n ! 14, there is a gap in the sequence of / k parameters, indicating the location of a spike of f þ ðtÞ: It means that the exponential attenuation is not strong enough for suppressing f þ ðtÞ right after the spike, and the minimal SCV is obtained when some cosine-square terms are used to enforce the suppression beyond the spike.
The number of cosine-square terms used to suppress f þ ðtÞ beyond the spike changes at n 2 f14, 28, 43, 60g: These changes result in the small inhomogeneity in the x values at n 2 f14, 28, 43, 60g in Figure 5.
According to our intuitive understanding x tends to 0 as n ! 1, because the cosine squared terms are more efficient in suppressing f þ ðtÞ than the exponential attenuation, and for large n, the cosine squared terms Figure 7. f þ ðtÞ with x ¼ 1, n ¼ 3, / i þ p 2 f0:1, 1, 2g for i 2 f1, 2, 3g: Figure 8. The initial part of Q n i¼1 cos 2 xtÀ/ i create a sharp spike inside the ð0, 2pÞ cycle (which is the ð0, 2p=xÞ interval), such that the zeros are located on both sides of the spike. The cosine squared terms are 2p=x periodic and consequently, there is a spike also in the ð2p=x, 4p=xÞ interval which is suppressed by exponential attenuation. The exponential attenuation in a 2p=x long interval is e 2p=x : In order to efficiently suppress the spike in the ð2p=x, 4p=xÞ interval, x has to be small.

Heuristic optimization with 3 parameters
According to the previously discussed approach the number of parameters to optimize increases with n. This drawback limits the applicability of the general optimization procedures to about n 74 in the case of BIPOP-CMA-ES and about n 180 in the case of the basic CMA-ES. Using these n values the optimization procedure takes several days to terminate on an average PC clocked at 3.4 GHz. While the f þ ðtÞ function obtained this way for n ¼ 180 has an extremely low SCV ( % 10 À5 ) already, some applications might benefit from ME distributions with even lower SCV. To overcome this limitation we developed a suboptimal heuristic procedure, that aims to obtain low SCV for a given large order n.
Our heuristic procedure has to optimize only three parameters, independent of the order n. The procedure is based on the assumption that the location of the spike in the ð0, 2pÞ cycle of the cosine-squared function plays the most important role in the SCV, and the exact values of the / k parameters are less important, the only important feature is that the cosine-squared terms characterized by the / k parameters should suppress f þ ðtÞ uniformly in the ð0, 2pÞ cycleapart from the spike (cf. Figure 9). In case of ME distributions, the spike is the dominant mode of the density function.
Based on this assumption we set the / k parameters of the cosine-squared terms equidistantly. This way the position of the spike (denoted by p) and its width (w) inside the ð0, 2pÞ interval completely define the / k values for a given order n.
The distance of the / k parameters (d) and the number of / k parameters before the spike (i) can be computed from p and w by and for k 2 f1, :::, ng the / k parameters are The obtained heuristic procedure has only 3 parameters to optimize: x, p and w (see Algorithm 2). The SCV values computed by this heuristic optimization procedure are depicted in Figure 10 for n 2 f10, :::, 74g: The figure indicates that for small n (n < 15) the procedure is inaccurate, but it is not a problem because the minimal SCV can be computed quickly in these cases. For larger n (n ! 15) the SCV provided by the heuristic procedure is less than twice the minimal SCV in the given range. Assuming that this ratio to the optimal SCV remains valid also for n > 74 the heuristic procedure which is applicable up to n ¼ 1000, is an efficient tool to compute highly concentrated ME distributions for large order n values.
Algorithm 2. The objective function of the heuristic method 1. procedure ComputeSCV-heuristic(x, p, w) 2. Obtain the distance of zeros, d, and threshold i by (23) 3. Obtain / i for i 2 f1, . . . , ng by (24) 4. Compute SCV by Algorithm 1 5. return SCV 6. end procedure   Figure 2 suggests that the heuristic optimization remains very close to the minimum also for larger n values and the SCV obtained by the heuristic optimization maintains its polynomial decay between n À2:1 and n À2:2 :

Implementation notes
The reader can validate the results of the paper based on the parameters we computed for a representative set of n values between 1 and 1000 and made available at https://github.com/ghorvath78/iltcme. [10] The file itccme.json contains (among other parameters) the a ðnÞ k , b ðnÞ k , c ðnÞ parameters for the computed set of n values from which f þ ðtÞ can be plotted using (14) (to check the non-negativity of f þ ðtÞ), and the SCV can be computed using (13) in any general mathematical programing environments. Some related functions processing the parameters in itccme.json are also available at https://github.com/ghorvath78/iltcme [10] in various programing environments. Additionally, the implementation of the optimization procedure at https://github.com/ghorvath78/cmefinder [9] allows the reader to reproduce the parameters of the CME distributions presented in itccme.json.
The parameter setting of the code in https://github.com/ghorvath78/cmefinder [9] resulted from the trends of the parameters depicted in Figures 5  and 6, which can also be projected to unexplored orders with some confidence. Our numerical experiments suggest that there are several local optima with similar SCV in the range of initial guesses and outside that range the obtained SCV is several orders of magnitude larger. That is, the optimization is sensitive to the initial guess, but with a proper initial guess it is insensitive to implementation details like the stopping criteria.

Conclusion
The paper presents a method to generate ME distributions with very low SCV. The hyper-trigonometric representation of the subset of ME distributions with exponential cosine-squared density function enables the efficient, explicit computation of the squared coefficient of variation. By selecting the appropriate numerical precision and a suitable numerical optimization method, we managed to create ME functions up to order n ¼ 1000 with very low SCV (<10 À6 for n ¼ 1000). Such non-negative, low-SCV matrix exponential functions are important ingredients in several numerical procedures, including the numerical inverse Laplace transform and representing deterministic delay in stochastic models.

Funding
This work is partially supported by the OTKA K-123914 and the TUDFO/51757/2019-ITM grants.