Statistical and computational intelligence approach to analytic continuation in Quantum Monte Carlo

The term analytic continuation emerges in many branches of Mathematics, Physics, and, more generally, applied Science. Generally speaking, in many situations, given some amount of information that could arise from experimental or numerical measurements, one is interested in extending the domain of such information, to infer the values of some variables which are central for the study of a given problem. For example, focusing on Condensed Matter Physics, state-of-the-art methodologies to study strongly correlated quantum physical systems are able to yield accurate estimations of dynamical correlations in imaginary time. Those functions have to be extended to the whole complex plane, via analytic continuation, in order to infer real-time properties of those physical systems. In this Review, we will present the Genetic Inversion via Falsification of Theories method, which allowed us to compute dynamical properties of strongly interacting quantum many--body systems with very high accuracy. Even though the method arose in the realm of Condensed Matter Physics, it provides a very general framework to face analytic continuation problems that could emerge in several areas of applied Science. Here we provide a pedagogical review that elucidates the approach we have developed.


Introduction
A very challenging problem emerging in pure and applied Physics, as well as in many branches of Science, is analytic continuation. Such term arises naturally, at an abstract level, in the realm of complex analysis, where it is defined as a technique to extend the domain of a given analytic function, say F : Ω ⊂ C → C. Strictly speaking, performing analytic continuation means finding an analytic functionF :Ω ⊂ C → C such thatΩ ⊃ Ω andF (z) = F (z) ∀z ∈ Ω. In other words, the key point is to use the information encoded in F (Ω) to find, or infer, the values of F on a wider set. In addition, in many important cases, the knowledge of F (Ω) itself can be affected by uncertainties, that could arise from the numerical or experimental determination of the values of the function.
Situations where analytic continuation turns out to be useful or even necessary are present in a very broad range of physical or even more generally scientific studies, encom-in a schematic way, is the following: when building up a theory, how many answers may we expect from observations?
Even if, at first sight, this seems to have nothing to do with analytic continuation, a connection can be foreseen if one considers the imaginary-time properties as the observations and the real-time properties as the theory. This interpretation, although somehow artificial, is indeed meaningful in the sense that, in fact, techniques are available to compute, i.e. to observe, imaginary-time properties, while, currently, general direct computation in real time is much harder. Therefore, real-time properties have to be guessed, like a theory is guessed from observations. Anyway, inverse problems have been well known since the earliest days of research in Physics and the relation between theory and experiment is, of course, central in Physics and in Science in general, much beyond the domain of analytic continuation.
The key question quoted above, in the more specific language of analytic continuation, sounds as follows: how can we use the observations of the values of a function F on a given domain, say imaginary time, to infer the functionF on a different domain, real time, where we cannot calculate the values directly? At a first glance, one feels that such an inverse procedure in realistic situations is unavoidably ill-posed, since any set of observations is limited and noisy, thus ruling out the possibility of finding out one and only one theory, i.e. function, whose predictions fit such data. In other words, there will exist infinite functions F defined inside the complex plane that, when restricted to imaginary-time axis, will be compatible with the estimated imaginary-time correlation functions.
Many tools have been devised to face those ill-posed inverse problems. Roughly speaking, the approaches can be divided into two highly overlapping families. Some approaches modify the problem, via a regularization technique, attempting to find a well-posed, or less ill-posed, problem. Regularizing the problem means introducing additional constraints to the problem itself; such constraints very often rely on euclidean norms and are meant to have a unique solution to the problem or, at least, to drastically reduce the number of possible solutions [5]. On the other hand, another class of approaches looks for a solution in a statistical meaning, aiming at maximizing the probability of finding a functionF , given the function F which is known [5]. This probability is built up in the realm of a Bayesian description of the analytic continuation problem. In many situations, mixed approaches including regularization techniques and statistical approaches are used.
Our approach relies on a falsification principle which we are now going do explain in some detail. Following Popper [6], Tarantola [7] put forward the proposition that observations may only falsify a theory. In order to understand the far reaching consequences of this proposition, let's formulate the problem at an abstract level. Suppose we have measured F (Ω), through some experimental or numerical tool. In principle, if we were able to perform an infinite number of measurements, we would have found a sample of results. It is thus natural to consider our data, which we will denote by d, as a point in a, possibly huge, set of all possible outcomes, that is a set of data, say D. On the other hand, we denote S the set of all the possible theories, i.e. the set of all the candidate solutions. The inverse problem would sound as follows: may we find a theory s ∈ S predicting D? That is, can we find s ∈ S which is not falsified by any of the elements of D? If the answer was positive, of course, that theory would be our solution. This is an idealized situation, since in a finite time we actually never know D but only a subset D ⋆ ⊂ D whose elements are in general used to give an estimation of the statistical uncertainties in the observations. Normally an infinite number of theories exists compatible with D ⋆ ; it is then clear that we may exclude some theories but we still remain with a set S D ⋆ ⊂ S of equivalent "solutions".
Depending on the mathematical details of the space S, a natural idea appears to be that of devising a procedure enabling to capture what the theories in S D ⋆ do have in common.
In this way, even if we won't succeed in finding out a unique theory s ∈ S, we will be able nevertheless to find out a class of features, providing physical properties, that s has to possess so that it will not be falsified by the limited set of observations.
In order to better understand what this means in a specific example, we will now focus on the typical analytic continuation problem in condensed matter physics: the estimation of spectral functions of many-body quantum systems starting from imaginary-time correlation functions.

Analytic continuation in Quantum Monte Carlo
The formal definition of a spectral function of a many body physical system is as follows: A andB being given operators acting on the Hilbert space of the system whose Hamiltonian operator isĤ. The brackets indicate the expectation value on the ground state or thermal average. We also introduce the imaginary-time correlation function: It is evident that, if we were able to extend the domain of F , computing: we would immediately be able to compute Eq. (3).
In the language of inverse problems, the function s(ω) is the theory we are looking for, while F (τ ) corresponds to the observations. In the language of analytic continuation, on the other hand, it is evident that the two functions are related by Eq. (2) and a Fourier transform, or, equivalently, a single inverse Laplace transform. As mentioned earlier, the reason why we consider F (τ ) as the observation, while s(ω) is the unknown theory, is that, using quantum Monte Carlo simulations, when the sign problem does not show up, for example for Bose fluids, it is possible to obtain exact estimations of the values of the function F [8][9][10]. In this context, exact means that, for a given statistical accuracy, every systematic error can be reduced below the noise level via a suitable tuning of the parameters.
To summarize, the situation is as follows: with QMC methods we can estimate values of F (τ ) in correspondence with a finite number of imaginary-time values depending on the discretization of the methodology. To be specific we will use the notation f ≡ {f i = F (iδτ ), 0 ≤ i < l}. In general f is obtained as an average of several QMC calculations of F (τ ), each affected by statistical noise, and which are used to estimate the statistical uncertainties {σ i } associated with {f i }. Moreover, the set of observations can be often enriched relying on sum rules, which prompt to perform additional QMC measurements providing estimations for some momenta of s(ω): c ≡ {c n = +∞ −∞ dωω n s(ω), n ∈ Z} (for example c 0 = ÂB may be easily estimated in equilibrium QMC simulations with an associated statistical uncertainty).
In this context, the inverse problem of estimating s(ω) has the formal appearance of a Fredholm integral equation where for example, at zero temperature, K(τ, ω) = θ(ω)e −τ ω , θ(ω) being the Heaviside distribution. In many cases, this equation can be complemented by some a priori knowledge that can be deduced from the formalism of quantum mechanics, such as the support, nonnegativity or some further properties. Since we start from limited and noisy data and the considered kernel corresponds to an ill-conditioned matrix when discretized, it is evident that Eq. (6) does not have a unique solution and that small variations of the data can largely affect the resulting spectra. In general, there will be an infinite number of functions s(ω) whose "predictions", namely the reconstructed dataf obtained by the integral in Eq. (6), fall inside the confidence

Genetic Inversion via Falsification of Theories
Being of paramount importance, the task of facing the problem in Eq. (6) has been investigated by many methods.
The Maximum Entropy Method (MaxEnt) and its variants [11][12][13][14][15] are the most popular approach. MaxEnt applies the maximum likelihood principle within a Bayesian framework. To be more specific, the Bayesian conditional probability of the spectrum s, given the measurements f * , is equivalent to P (s|f * ) = P (f * |s)P (s)/P (f * ), where the conditional probability P (f * |s) of the measurements, given the spectrum, is called the likelihood, while the a priori probability P (s) reflects previous belief concerning the spectrum in the absence of data. Finally P (f * ) is the normalization. In the classical formulation MaxEnt assumes P (f * |s) ∝ exp (−χ 2 /2), where χ 2 is the quadratic distance of the data f * from the reconstructed dataf . The a priori knowledge is enforced as an entropic term with respect to a default model m: P (s) ∝ exp (−α dωs(ω) ln [s(ω)/m(ω)]), and various ways are proposed to treat the parameter α and the default model m(ω). The entropic term and some suitable regularization of the necessary matrix operations are crucial in this approach in order to obtain a smooth solution.
The Average spectrum method (ASM), also called Stochastic Analytical Inference [16][17][18][19][20][21][22], drops out the need of the entropic prior by averaging over different spectra according to the likelihood P (f * |s) ∝ exp (−χ 2 /2T ). Through a Monte Carlo sampling of spectral functions, the effective temperature T is used to gradually force more adherence to the data via a simulated annealing procedure. Various prescriptions for stopping the simulated annealing have been investigated. The averaging procedure smooths the final average and retains only common features. It has been demonstrated [18] that MaxEnt can be derived as a mean-field limit of the ASM. This finding has been further explored in [21].
The Stochastic optimization with consistent constraints method (SOCC) [23][24][25] also averages over spectra which are obtained with a Monte Carlo walk aiming at minimizing the χ 2 distance of the data and reconstructed data; however, the averaging procedure consists of a linear combination of spectra, aiming at obtaining a maximally smooth result. This flexible averaging procedure allows for estimating error-bars for the spectrum, by artificially emphasizing the main spectral features up to values which would worsen the value of the χ 2 .
Without being exhaustive, we mention that other research directions involve the use of other basis functions for the spectra [26,27], the exploit of more information from tailored QMC simulations [28,29] or the consideration of different kernels [30]. We also note that, in the context of image reconstruction, different measures of distance of the reconstructed data from measurements have been proposed, such as the Kullback-Leibler or I-divergence [31], which allow for the implementation of deterministic error-reduction algorithms [32].
The GIFT approach [10], which we are going to describe in detail, follows more radically the general scheme outlined in the introduction: we need a space of models S, containing a wide collection of spectral functions consistent with any prior knowledge about s(ω), a falsification procedure relying on the QMC "measurements" d = {f, c} ∈ D, and a strategy to capture the accessible physical properties of s(ω). The introduction of the space D is meant to stress that there is nothing special in the particular set of QMC measurements d. If a new independent simulation is performed, a new set of measurements will show up and it will be completely equivalent to the original one.

The space of models
In our mathematical framework S is made of step functions, providing a compromise between the possibility of suitably approximating any model of spectral function and the feasibility of numerical operations inside it.
In the typical case (Â =B † ) when s(ω) is known to be real-valued, non-negative and the zero-momentum sum rule holds, we rely on models s of the form: We rely on a fixed partition {ω 0 , ..., ω m } of widths ∆ω j of an interval of the real line much larger than the hypothesized support of s(ω). In particular, we use equally-spaced frequencies up to an intermediate value where the spectrum is hypothesized (or verified with exploratory reconstructions) to have significantly decayed, and we then employ exponentially spaced frequencies, to ease the fulfillment of high-order sum rules. In some cases, for example for some 1D models [22,33,34], a minimal threshold frequency ω th is known, below which the spectrum is zero; this can be easily implemented by setting ω 0 = ω th . We use the characteristic function χ Ij (ω) of the intervals I j = [ω j , ω j+1 ), which takes the value 1 inside I j and 0 outside. Moreover, we introduce a discretization of the codomain, s j ∈ N ∪ {0}, to make the space finite. M provides the maximum number of quanta of spectral weight available for the ensemble of the intervals I j . Notice that s(ω) differs from the physical spectral functions by a factor c 0 , being c 0 the zero-momentum, which belongs to the set of observations. Just to stress the different kind of applications that the present framework could include, we note that the choice expressed by Eq. (7) originates from the fact that the first GIFT implementation was inspired by an application in quantitative finance related to a stochastic optimization of portfolios based on Genetic Algorithms [35]. In that application m represented the maximum number of assets included in the portfolio, M the total investment, which is naturally quantized, and s j was the number of quanta of investment for asset j. In the present context, the choice in Eq. (7) is meant to provide a sufficiently vast family of functions: the basic idea is that, apart from a priori knowledge arising from the formalism of quantum mechanics, no additional information has to be imposed in the definition of the space of model. Of course, one has to verify that the results do not change significantly when decreasing the interval widths. Other representations of the spectral function, such as a sum of delta contributions, may be implemented as well, with negligible impact on the methodology. Other stochastic approaches also optimize the widths of the intervals [23] or the positions of the delta functions [21]: this can increase efficiency.

Falsification, Fitness, and Genetic Algorithms
How can we explore S and falsify its elements? As mentioned before, the most important point is the translation into a practical algorithm of the falsification principle described in the introduction. In principle, not only the observed data d = {f, c}, but any equivalent data d ⋆ = {f ⋆ , c ⋆ }, that are the result of an independent simulation, should play an equivalent role in determining whether a model has to be falsified or not. The simplest way to achieve this in practice leads us to the definition of the fitness of models: depending on the set of data, together with the introduction of a scheme to build up d ⋆ starting from d. In principle, one could store different realizations d ⋆ directly from independent blocks in the QMC simulations. In our implementation, different random sets d ⋆ = {f ⋆ , c ⋆ } are obtained by resampling independent Gaussian distributions centered on the original QMC observations d, with variances which correspond to the estimated QMC statistical uncertainties. Generalizations can be easily conceived if the covariance matrix among the data is computed during a simulation. More precisely, suppose that the l × l matrix C ij = cov(f i , f j ) is estimated. The above-mentioned sampling of independent Gaussian distributions relies on the approximation C ij ≃ δ ij σ 2 i . If the nondiagonal elements are also computed, the equivalent data can be sampled using the formula: where ε j are realizations of independent standard normal random variables, and the lowertriangular matrix L ij satisfies LL T = C [5]. Matrix L can be obtained via standard Cholesky decomposition, since the covariance matrix of vector f is, by construction, realvalued and positive definite. Once this is done, the fitness in Eq. (8) can be modified using The results presented in this Review are obtained by ignoring covariance and considering only the variance of the data. It can be argued that the typical covariance of QMC imaginary-time data is positive, due to the kinetic term in the density matrix, so that ignoring covariance yields unnecessary fluctuations in our resampled imaginary-time data. This is overcome by pursuing lower values of the fitness function (8) before the final averaging procedure is done [17].
In the definition of Eq. (8), the free parameters γ n > 0 are adjusted in order to make the contributions to Φ d ⋆ coming from f ⋆ and from c ⋆ of comparable order of magnitude, provided that convergence of the algorithm does not slow down due to a too strong constraint coming from high values of γ n . If it happens that one c n is exactly known, no error is added by making c ⋆ n = c n . The idea, then, is as follows: we sample several independent equivalent data d ⋆ and find out the set of models which are not falsified by them. In order to achieve this, we rely on Genetic algorithms (GA), which are known to provide an extremely efficient tool to explore a sample space by a nonlocal stochastic dynamics, via a survival-to-fitness evolutionary process mimicking the natural selection we observe in the natural world. Such evolution aims at maximizing the fitness towards "good" building blocks [36] which, in our case, should recover information on physical spectral functions.
In our GA, for each resampled d ⋆ , we start randomly constructing a collection of s(ω), the initial population, consisting of N s individuals. Each s(ω) is coded by m integers, s j in Eq. (7). The genetic dynamics then consists in a succession of generations during which the initial population, consisting of N s individuals, is replaced with new ones in order to reach regions of S where high values of the fitness exist, for a given d ⋆ . In the passage between two generations a succession of "biological-like" processes takes place, given by the genetic operators described in the next subsection. The GA dynamics performs the falsification procedure: for each d ⋆ , only the s(ω) with the highest fitness in the last generation provides a model for s(ω) which has not been falsified by d ⋆ . This yields the set S D ⋆ made of the elements c ⋆ 0 s(ω). Finally, an averaging procedure of the elements of S D ⋆ appears as the most natural way to extract physical information. Presently, we also calculate the variance of the ensemble S D ⋆ as a way to estimate the variance of the spectra. However, this is only qualitative, since a more complete information would stem from considering the whole covariance matrix of the estimated spectral function. Notice also that the definition of the fitness in Eq. (8) is essentially equivalent to the log-likelihood of other Bayesian approaches, in presence of a uniform prior. However we do not attempt to find a single maximum-likelihood spectrum, which would be affected by the saw-tooth instability, but we create an ensemble of spectra for which the magnitude of the fitness is of order of less than l + 1. This is very similar to the ASM approach, with the advantage of the speed-up coming from nonlocal genetic evolution.

The genetic dynamics
The typical genetic operators are called selection, crossover and mutation. We found it useful to add also a rejection operator [33], that allows for more flexibility in the mutation moves, and constitutes a clean bridge between the genetic and the ASM approaches, yielding a hybrid genetic-ASM algorithm. We now describe each step of the genetic evolution, for a given realization of d ⋆ . The Selection and Rejection operators are always executed, while the Crossover and various Mutation operators are called with some probability, which is chosen after performing small-scale exploratory runs to increase efficiency.
• Selection. The population of the previous generation is ordered in ascending fitness; then a couple of individuals ("mom" and "dad") are selected corresponding to the indexes k = [N s int(r β )] + 1 obtained by sampling two uniform random numbers r ∈ [0, 1); the nonlinearity of k on r is such that individuals with large fitness are preferentially selected; we typically use β = 1/3. • Crossover. An amount of quanta Q is uniformly chosen in the interval [0, M/2], to be exchanged between mom and dad. Optionally, one can choose Q ≈ M/2, which accelerates convergence, but is more computationally expensive. For both mom and dad, a random set of bins {I j } is chosen whose total spectral weight is Q. Then, the selected weights of the mum spectrum are moved to the dad spectrum, keeping their original mum positions. Vice-versa, the selected dad weights are moved to the mum spectrum, keeping their original dad positions. This way, the total spectral weight is preserved, while a very nonlocal operation is performed. In particular, it is likely that the main spectral features of mum and dad are exchanged. We observe that this operator, together with the selection operator, is the main feature concretely distinguishing our approach from the ASM and SOCC methods, allowing for a significant speed-up of the evolution. This operation is indeed performed with a high probability of 30 ÷ 40%. • Mutation. Having obtained two new spectra, son 1 and son 2 , from mom and dad, we are now prompted to perform single-spectrum moves. There is large room for experimentation at this point; however, since detailed balance is not respected, a risk is present to systematically bias the final result towards specific spectral features, depending on the chosen mutation operators. For example, in the relevant cases discussed in Subsection 4.1, typical spectra should include a single narrow peak and a minor broad structure, while in Subsection 4.3 major broad structures are expected: in this case, mutation operators favoring only the identification of peaks may hamper the efficient reconstruction of almost flat spectra. It is thus important to include a variety of mutation operators to render the algorithm ergodic. We name a few examples: • Local mutation. The shift of a random fraction of spectral weight between two random neighboring intervals. With small probability, the shift is performed preferentially to intervals where spectral weight is already present, which is useful for the quick discovery of peaks [10]. • Nonlocal mutation. The shift of a random fraction of spectral weight between two random intervals. In order to avoid the worsening of the fitness due to this nonlocal move, especially in the part concerning the spectral moments c, we impose a detailed balance condition using the probability density exp (−Φ d ⋆ (s)/T ), as in the ASM method, with an effective temperature that is reduced during the simulation. • Smoothening. A short succession of neighboring intervals is randomly chosen.
The weights are convoluted with a smoothening kernel, which essentially performs a weighted average of the original spectral weights. • Error reduction. The steepest descent method, or one of its more stable variants, is applied to the maximization of the fitness in Eq. (8), using its functional derivative with respect to the spectrums. Its efficient implementation suggests the use of real-instead of integer-valued weights, which requires a trivial adjustment of the algorithm. This operator performs a deterministic optimization of the spectra; however, when used alone, it is only able to get to local maxima of the fitness. The combined use of stochastic and deterministic operators yields a so-called memetic algorithm [37][38][39]. • Flattening. A random number of neighboring interval weights is substituted by their average. This is a very nonlocal operation and has to be selected not too often. It is in particular useful to explore almost flat and broad spectra, which would otherwise be created very seldom, due to entropic reasons. After all the described operations, missing or exceeding weight is randomly redistributed among the spectral intervals, in order to respect the zero-momentum sum rule.
• Rejection. Since some of the mutations, such as flattening, could bias the population, an accept/reject operation is carried out, by performing a Metropolis check comparing the weight exp (−Φ d ⋆ (s)/T ) of son 1 with mom and of son 2 with dad. Notice that this does not implement detailed balance because of the selection and crossover operators. However, by disabling those, we essentially recover the ASM approach. The effective T is typically the same used in the nonlocal mutation move and is reduced through a simulated annealing schedule. We start from a very high T ≈ 10 6 , where the moves are usually accepted (analogously to the original version of the algorithm [10]), and we geometrically reduce it every ∼ 100 iterations, until a small value T ≈ 10 −2 is reached. We observed that the precise initial and final T have a negligible impact on the resulting final spectrum.
The above procedure is repeated until all the individuals s(ω) of the population are replaced by a new generation, except for the s(ω) with the highest fitness in the old generation which is cloned (elitism). To decrease the computational cost, the number of individuals in the new population is reduced by a small fraction at every generation till N s is equal to a given minimal value; from this point over, the number of individuals N s in the new generations is kept constant. We monitor the average fitness and χ 2 of the best s(ω) in the final ensemble S D ⋆ in order to respect the falsification principle, then we take the final average of elements in S D ⋆ , which yields the spectra that are shown in this review as an example. To choose the algorithm parameters, we perform short preliminary runs during which we monitor the fitness as a function of the generation. A slow increase of the fitness usually indicates a poor choice of the frequency parametrization, while a behavior of the fitness showing frequent abrupt increases indicates a good choice of parameters. The typical resulting initial population size is N s ≃ 5000, the typical number of resampled data is N d ⋆ ≃ 200, and the typical number of generations is N gen ≃ 10000. Longer runs and population sizes are seldom needed, but are of course beneficial if larger computational power is available, provided overfitting (|Φ d ⋆ (s)| ≪ l +1) is avoided. The typical number of frequency intervals depends on the desired resolution and quality of initial imaginary-time data, but it is of order m ≃ 500. We usually find that this is the most delicate parameter, but the choice of logarithmically spacing at high frequencies, as described in subsection 3.1, prevents the need for very large m.
We have performed several tests [10] on exactly solvable analytical models suitably discretized and "dirtied" with random noise to "simulate" real data. Having in mind the 4 He case, we have tried to reconstruct spectral functions consisting of linear combinations of Gaussians, one sharp "peak" at small ω and one broad contribution at higher ω, or, for the one-dimensional case, a combination of rectangular shapes and power-law decays [33]. We have observed that none of the parameters have a critical role, once the frequency support is identified. However, only some features of the exact solution can be consistently reproduced: we have no possibility to exactly reconstruct the shape of s(ω), especially at high frequency; on the other hand, access is granted to the identification of the presence of a sharp peak and to its position, to the position of the broad contribution, to some integral properties involving s(ω) and to its support. Moreover, we found that it is possible to estimate properties of the shape of the spectra (for example power-law decay close to thresholds [33,34]), once the low-frequency support is reliably estimated. In this case it is also crucial to analyze directly the elements in S D ⋆ and not only their final average.
We present now some applications of this approach: for the sake of conciseness we will discuss only some of our results concerning the dynamical structure factor of 4 He atoms in three-dimensions (3D) [10] and confined to two-dimensional (2D) [43] or one-dimensional (1D) [33] geometries. We mention that in the realm of bulk liquid helium, methods such as MaxEnt [8,9], modified kernels [30], and simulated annealing [59], have also been used. The dynamical structure factor S(q, ω) is a spectral function which is directly related to scattering experiments coupled to density fluctuations in linear response. It is a function of both frequency ω and momentum q and its peaks allow for the determination of the dispersion relations of coherent collective modes, if present, while broad features indicate multiple excitations or damped modes. The study of the spectrum of elementary excitations of 4 He systems in different geometries is of interest on one hand to investigate the fate of the phonon-maxon-roton spectrum upon change of the dimensionality of the system; on the other hand, it is useful for the interpretation of past or forthcoming scattering experiments involving 4 He systems in bulk [60] or confined geometries: 4 He atoms adsorbed on planar substrates [61] or confined in nano-channels [62,63].
In all the applications we are going to discuss in the following, the intermediate scattering functions F (q, τ ), i.e. the basic ingredient of the GIFT method, have been computed using the exact shadow path integral ground state (SPIGS) method [64,65]. For the technical details we invite the reader to refer to the original articles [10,33,43]. In short, SPIGS is a T = 0 K path integral ground state (PIGS) method [66], which relies on an imaginarytime projected Shadow wave function [67]. Path integral projector methods like PIGS and SPIGS allow for the calculation of exact ground-state expectation values by systematically improving an approximation of the ground-state wave function via successive small imaginary-time projections. The imaginary-time evolution cleans up the spurious overlaps of the variational wave function with excited states and introduces the missing groundstate correlations between particles. We have shown that when the total imaginary-time projection is large enough, these methods provide exact ground-state expectation values, within the statistical uncertainties of the calculations, without any bias due to the choice of the variational wave function [68].

Three-dimensional case
The first application of the GIFT method was the study of the dynamical structure factor of superfluid 4 He in 3D geometry [10]. As far as we know, this was the first analytic continuation method, different from the MaxEnt method, applied to this very peculiar condensed matter system. The use of GIFT turned out to be a major improvement with respect to previous MaxEnt studies of superfluid 4 He [8]: we were able to recover sharp quasi-particle/elementary excitations, with excitation energies in good agreement with experimental data, and spectral functions displaying also the multi-phonon branch (i.e. a branch of the spectrum corresponding to the creation of multiple elementary excitations) with the correct relative spectral weight.
These results can be observed in Fig. 1 where color maps of the dynamical structure factors at equilibrium (ρ 3D = 0.0218Å −3 ) and freezing (ρ 3D = 0.0262Å −3 ) densities, extracted with the GIFT method, are shown together with the experimental quasiparticle/elementary excitations energies [69][70][71]. As a reference, we also plot the freeparticle dispersion ε 0 (q) = 2 q 2 /2m, where m is 4 He mass. Moreover, we show the famous Feynman's dispersion relation ε FA (q) = ε 0 (q)/S(q), which can be derived using a variational argument [72], where S(q) is the static structure factor. This dispersion correctly manifests a minimum in the excitation close to momentum 2π/a, where a is the hard-core Corresponding GIFT strength of the quasi-particle peak Z(q) as a function of q and comparison to experimental data [69][70][71]. The reference free ε 0 (q) and Feynman's ε FA (q) dispersion relations are described in the text.
size of the interaction potential, and was first phenomenologically hypothesized by Landau [73]. In the same momentum region, the static structure factor features a peak, provided the average interparticle distance is of the order of a. Our reconstructed S(q, ω) exhibit an overall structure in good agreement with experimental data: a sharp quasi-particle peak and a shallow multi-phonon maximum are present. Both features appeared for the first time within an analytic continuation procedure applied to a QMC study of a many-body system in the continuum. By integrating the extracted S(q, ω), one has access to quantities like the strength of the single quasi-particle peak, Z(q), and thus also to the contribution to the static structure factor, S(q), coming from multi-phonon excitations. Remarkably, Z(q) is in close agreement with experimental data [69] (see Fig. 1), thus strongly suggesting that the broad structure in S(q, ω) at large frequency carries indeed reliable physical information on the multi-phonon branch of the spectrum. Given the assumption of a pair-wise interatomic interaction [74] and the experimental and algorithmic statistical uncertainties (the latter being estimated via multiple independent QMC simulations and GIFT reconstructions, also involving variants of the interaction potential [74][75][76]), the agreement of the extracted Z(q) is very good. This shows that, via analytic continuation, it is at least possible to extract one sharp feature in the spectral function with the correct spectral weight and a broad multi-phonon component which represents semi-quantitatively the combination of multiple quasi-particle excitations. Note that here the width of the reconstructed single quasi-particle peak is mainly a measure of the uncertainty in the statistical reconstruction of the position. Thus the exact shape of the spectral function is not accessible, given the ill-posed nature of the problem: future improvements will unavoidably require QMC simulations on more powerful computational facilities.

Two-dimensional case
Another application of the GIFT method to superfluid 4 He systems has been the study of S(q, ω) for a pure 2D geometry. As highlighted by the 2016 Nobel Prize in Physics [77], bosons in two dimensions are of great theoretical interest because the standard scenario of superfluidity associated with Bose-Einstein condensation (BEC) is not appropriate anymore. As shown by J.M. Kosterlitz and D.J. Thouless [78], the notion of long-range order has to be replaced by that of topological long-range order, characterized by a slow algebraic decay of the local order parameter correlation function. Notwithstanding a vanishing order parameter in 2D, i.e. a condensate wave function which vanishes at any finite temperature for a bulk system, a superfluid response is theoretically predicted up to a temperature where vortex and antivortex pairs unbind. The dynamical structure factors obtained for 2D 4 He [43] at different areal densities are reported in Fig. 2. We found well defined excitations in the full density range where the system is superfluid; however, significant differences are present with respect to 3D 4 He. In 2D, close to the spinodal density (ρ 2D = 0.0321Å −2 ), where the system is unstable against droplet formation, the excitation spectrum features the maxon (the maximum of the coherent dispersion) and the roton (the finite-momentum minimum) frequencies almost coalescing in a plateau. At the equilibrium density (ρ 2D = 0.043Å −2 ), the small peak in the static structure factor causes a maxon-roton structure which is rather weak, with the maxon frequency only 10% higher than the roton frequency. Above the equilibrium density, a well defined maxon-roton structure develops (see the density ρ 2D = 0.0536 A −2 ) and, finally, at freezing density (ρ 2D = 0.0658Å −2 ) the ratio between maxon and roton energies is found as large as 3. At the same time the wave vector of the roton has a strong density dependence, whereas that of the maxon is almost density independent. This strong evolution of the shape of the excitation spectrum with the density is probably due to the wider density-range of existence of the fluid phase in 2D: the freezing density is more than twice the spinodal density while in 3D it is only 60% larger. Moreover, in the maxon region for densities above equilibrium, the quasi-particle excitation peak is substantially broadened with respect to the roton region. This implies that, over an extended region of wave-vectors and of density, the elementary excitations have a finite lifetime even at T = 0 K. In fact, they can decay into other excitations, since the phonon region is characterized by a strong anomalous dispersion, featuring a positive curvature [43].

One-dimensional case
To conclude our discussion on the applications of the GIFT method to 4 He systems, we briefly review our recent study on a system of 4 He atoms in a pure 1D geometry. One-dimensional quantum systems exhibit spectacular signatures of the interplay between quantum fluctuations, interaction and geometry. The reduced dimensionality prevents the spontaneous breaking of continuous symmetries in the presence of short-range interactions [79], which results in a single Luttinger liquid phase for 4 He, with different character depending on density. Moreover, in the presence of hard-core interactions, bosonic and fermionic systems start to share common behavior, since two-body correlations must decay to zero in both cases. A color map of S(q, ω) for the 1D system is shown in Fig. 3. By increasing the density, the dynamical structure factor reveals a transition from a highly compressible liquid near the equilibrium linear density (ρ 1D = 0.036Å −2 ) to a quasi-solid regime (ρ 1D = 0.300Å −2 ). Notice that the range of considered densities is much larger than in higher dimensions, so the typical frequency and momentum scales vary considerably. In the low-frequency limit, the dynamical structure factor can be described by the quantum hydrodynamic Luttinger-liquid theory [80]: elementary excitations are unavoidably collective, i.e. phonons (this holds true even for fermionic systems). At higher energies, the GIFT analytic continuation approach provides quantitative results beyond Luttingerliquid theory. In particular, as the density increases, the interplay between dimensionality and interaction makes S(q, ω) manifest a pseudo-particle-hole continuum typical of fermionic systems. The fate of the phonon-maxon-roton spectrum, which characterizes the excitations in higher dimensions, is to merge into a pseudo-particle-hole continuum. It is interesting to note that, by increasing density, the spectral weight moves towards lower frequencies for wave-lengths of the order of the average interparticle distance, similarly to the behavior in higher dimensions. However, instead of having a neat roton excitation, a broad spectral structure bends down, and only at very high linear densities almost coherent modes are reached. However, we mention in passing that a power-law behavior close to the lower spectral support is in fact observed and indeed expected from non-linear Luttinger theories: we refer the interested reader to Refs. [33,34], where analytical efforts, motivated by the obtained GIFT spectra, yielded remarkable results.

Conclusions
We have described in detail a strategy we have developed to face the analytic continuation problem that emerges whenever real-time dynamics of strongly correlated physical systems is studied relying on estimations of imaginary-time correlation functions. We have presented it in a pedagogical way, in order to allow researchers to take full advantage of the methodology. We have enriched the presentation with figures showing the very accurate results we have obtained for systems of 4 He atoms in different geometries. In general, the family of stochastic analytic continuation methods is becoming more affordable due to the increase of available computational power, and is thus expected to have higher impact in the future, due to its accuracy. Our method combines the genetic speed-up coming from the Crossover move, to a remarkable robustness with respect to the chosen parameters, coming from the averaging procedure and the Rejection step. The only crucial input to be optimized is the frequency support and parametrization, which suggests that further improvement of the method is foreseeable. Moreover, knowledge about the typical features of the considered spectra helps in choosing the rate at which appropriate operators are preferentially called, thus increasing the rate of convergence. The problem we have faced belongs to the huge class of inverse problems, a deep topic also from a fundamental epistemological point of view [6]. The basic idea of the falsification principle [7] guided us in our particular implementation of analytic continuation: moreover, every analytic continuation problem emerging in Physics, applied Mathematics or, more generally, applied Science, can in principle be tackled by a suitable variation of the GIFT algorithm, which is per se an approach suitable for hybridization with other methods. We thus expect that the key ideas underlying our approach can be efficiently used also in other inverse/analytic continuation problems and in many research fields, like, just to cite one example, image reconstruction. In fact, one of us has recently faced the Phase Problem in Coherent Diffractive Imaging, following a similar approach: we have found that by building a memetic algorithm, i.e. hybridizing a Genetic Algorithm with standard iterative methods, it is possible to outperform the phase retrieval capabilities of the algorithms used as memes to assist the genetic stochastic search [39].