Normal mode analysis of a relaxation process with Bayesian inference

ABSTRACT Measurements of relaxation processes are essential in many fields, including nonlinear optics. Relaxation processes provide many insights into atomic/molecular structures and the kinetics and mechanisms of chemical reactions. For the analysis of these processes, the extraction of modes that are specific to the phenomenon of interest (normal modes) is unavoidable. In this study we propose a framework to systematically extract normal modes from the viewpoint of model selection with Bayesian inference. Our approach consists of a well-known method called sparsity-promoting dynamic mode decomposition, which decomposes a mixture of damped oscillations, and the Bayesian model selection framework. We numerically verify the performance of our proposed method by using coherent phonon signals of a bismuth polycrystal and virtual data as typical examples of relaxation processes. Our method succeeds in extracting the normal modes even from experimental data with strong backgrounds. Moreover, the selected set of modes is robust to observation noise, and our method can estimate the level of observation noise. From these observations, our method is applicable to normal mode analysis, especially for data with strong backgrounds.


Introduction
Relaxation processes are processes by which a perturbed system returns to equilibrium. Measurements of relaxation processes are essential in many fields, such as condensed matter physics [1,2], nonlinear optics [3], structural/reaction chemistry [4], astronomy [5] and atmospheric science [6]. Such processes provide many insights into atomic/molecular structures and the kinetics and mechanisms of chemical reactions. For example, Kim et al. [3] reported evidence of a Mott phase transition by analyzing the structure of a VO 2 metal crystal from a coherent phonon (CP) signal, which is the relaxation of phonon vibrations. Blagg et al. [7] recently reported the relaxation path and stable structure of the thermal magnetization of a lanthanide compound by analyzing X-ray diffraction patterns. Common to all these areas, the extraction of modes that are specific to the phenomenon of interest (normal modes) from the measured data is unavoidable.
In this study, we focus on the CP signal as a typical case in which the measurement data of a relaxation process include normal modes. Coherent phonons are waves of in-phase atomic vibrations over a macroscopic spatial range, and their signals can be observed by the pomp/probe method [8,9], which uses two ultrashort pulses of light. After the excitation of a substance by the pump pulse, lattice vibrations can be observed as oscillatory changes in optical constants with a delayed probe pulse; this approach has been used to reveal the dynamics of a photoinduced structural phase transition [10][11][12]. Figure 1(a) shows an example of a (111)-Bi polycrystal as a reference sample [13]. The (111)-Bi polycrystal has the A 1g and E g coherent phonon modes. Figure 1(b,c) show typical observations of the CP signal in a (111)-Bi thin film with the same experimental conditions. The difficulty in analyzing the relaxation process represented by the CP signal is that the experimental artifacts and observation noise vary greatly from measurement to measurement. Some signals are relatively easy to analyze owing to the flat background, as shown in Figure 1(b), while others have strong backgrounds and are difficult to analyze, as shown in Figure 1(c). The purpose of CP signal analysis is to extract normal modes from such signals, despite the difficulty. Conventionally, Fourier analysis and wavelet analysis have been employed for CP signal analysis [14][15][16][17][18]. Despite their widespread use, these methods cannot uniquely separate the normal modes since they use trigonometric and wavelet bases, which cannot express damped oscillations.
To address the aforementioned issue, Murata et al. [19] analyzed a CP signal using sparsity-promoting dynamic mode decomposition (SpDMD [20]) based on the fact that CP signals consist of a few damped oscillation modes. SpDMD is a method that decomposes time series data into a sum of damped oscillations and observation noise. The authors isolated candidates of the normal modes from experimental artifacts and observation noise by assuming that the noise was stationary and did not drift over time. Specifically, the authors estimated the amplitude of the noise from the negative time domain (t < 0 ðpsÞ) in Figure 1(b) since the signal was excited at t ¼ 0 ðpsÞ. However, the negative time domain of CP signals is not necessarily stationary, as shown in Figure 1(c) due to the instability of measurement instruments and lasers. The drift in such backgrounds has a significant influence on the extraction of normal modes.
In this study, we propose a data-driven framework to systematically extract normal modes, even when the background noise is unstable. The isolation of background noise and the extraction of normal modes from measured signals can be recast as a mode selection problem for CP signal estimation. We extend the previously proposed Bayesian LARS-OLS framework [21,22] to be used with SpDMD. Bayesian LARS-OLS is the mode selection framework from the viewpoint of data-driven approach in Bayesian inference. Bayesian LARS-OLS can analytically calculate the performance of a set of SpDMD modes in terms of the Bayesian free energy (FE) [23], which assumes that the measured data can be decomposed into a sum of amplified oscillations and observation noise. We numerically compare the performance of our proposed method with that of the previous method proposed by Murata et al. [19]. Our results consist of the following three points. First, our method succeeds in extracting normal modes from experimental data with strong backgrounds We then use virtual data to examine the effect of noise. The set of selected modes is robust to artificially induced Figure 1. (a) Schematic diagram of coherent lattice vibrations in a (111)-Bi thin film generated by ultrashort optical pulses. In the (111)-Bi thin film, the coherent lattice vibrations corresponding to two different symmetric point groups of A 1g and E g are observed by the pump-probe method. (b), (c) Signal data from lattice vibrations in the (111)-Bi thin film. The lattice vibrations start from the initial time 0. The signal contains two damped oscillations, A 1g and E g . We measured two experimental datasets, Exp. 1 in (b) and Exp. 2 in (c), at different times, so their backgrounds are different, but the signal contains the same damped oscillations. observation noise. Moreover, our proposed method can estimate not only the normal modes but also the level of observation noise. From these observations, our method applies to both mode selection and noise estimation, even for a strong background. These results suggest that our proposed data-driven method is applicable to the relaxation process analysis of material science. Conventionally, a lines of studies conducted Fourier analysis on the relaxation process in material science [1][2][3][4]. Our method provides a new way to decompose normal modes by incorporating the physical properties of the relaxation process with Bayesian inference.

Method
In this section, we introduce a method to extract modes from observation signals of one-dimensional time series, as shown in Figure 1. First, we introduce DMD [24] in Section 2.1. Then, SpDMD [20] is introduced in Section 2.2 as a method that can extract a normal mode expressing the lattice vibrations of a small (sparse) number of modes. Finally, we introduce a framework that extracts the normal mode from the mode extracted by Bayesian LARS-OLS [22] by optimizing the sparse parameter in SpDMD even if the value of experimental noise is unknown in Section 2.3.
CP signals are measured as one-dimensional time series ðy 0 ; :::; y N Þ with a uniform time interval δt, where N is the number of data points. To apply DMD and extract modes from observation signals of one-dimensional time series ðy 0 ; :::; y N Þ, we map the signals y t to an M-dimensional vector as ψ t ¼ ðy t ; y tþ1 ; :::; y tþMÀ1 Þ T ; (1) where the M-dimensional vector is called a state vector. We then obtain m þ 1 snapshots with a fixed time interval δt, as shown in Figure 2, and define the t-th state vector ψ t as the value at t Â δt. Assuming discrete linear dynamics between time t Â Δt and ðt À 1Þ Â Δt, we obtain ψ t ¼ Aψ tÀ1 ; t ¼ f1; :::; mg; where A 2 C MÂM . The objective is to estimate the matrix A from measured data that describe the discrete linear dynamical system. For this objective, we briefly explain the DMD algorithm according to Jovanović et al. [20]. As shown in Figure 2, we form two data matrices from the snapshot sequence Ψ 0 ¼ ðψ 0 ; :::; ψ mÀ1 Þ 2 R MÂm ; (3) where m ¼ N À M þ 1. Using Equation (2), we obtain the following relationship between Ψ 0 and Ψ 1 : The data matrix Ψ 0 is a nonsquare matrix; there is no guarantee of regularity even in the case of a square matrix, so the inverse matrix Ψ À1 0 does not exist in general. We represent the matrix A that satisfies the above equation by using the Moore-Penrose pseudoinverse matrix Ψ þ . The best-fit linear operator A that satisfies Equation 5 is the solution to the following least-squares optimization: Here, U 2 R MÂr ; AE 2 R rÂr ; V 2 R mÂr are matrices of a singular value decomposition (SVD): The eigenvalues and eigenvectors of the matrix A correspond to eigenvalues and eigenmodes of time evolution. The size of the matrix A 2 R MÂM is very large. To solve the eigenvalue problem efficiently, instead of the eigenvalue problem of the matrix A, we consider the low rank matrixÃ 2 R rÂr projected onto the singular vectors U. This SVD basis U is called the proper orthogonal decomposition (POD) [34,35].
Next, we computed the eigendecomposition ofÃ: The diagonal matrix D μ ¼ diagðμ 1 ; μ 2 ; ::; μ r Þ means the eigenvalues ofÃ, which are same in A. The corresponding eigenvectors of A is represented the follows: where W 2 C rÂr is a matrix ofÃ's eigenvectors.
Corresponding to DMD eigenvalues μ, we called the columns fϕ 1 ; ϕ 2 ; :::; ϕ r ; g ϕ 2 C M as DMD eigenmodes. Applying Equation (2) repeatedly, the state vector ψ t at step t from the initial vector ψ 0 can be represented as Here, we define α ¼ α 1 ; α 2 ; . . . ; α r ð Þ T ;Φ y ψ 0 2 C r as the mode amplitude, which corresponds to the coefficients of each mode. The vector corresponding to the We need to estimate α from the data, which is the solution to an optimization problem. From the above expressions, the data matrix Ψ 0 can be represented as Here, we define D α ¼ diag α 1 ; α 2 ; ::; α r ð Þand a r Â m Vandermonde matrix representing the latent structure in the temporal direction The coefficient α of each mode is expressed using the pseudo-inverse matrix Φ þ of the dynamic mode Φ.
Here, we obtain α by solving the least squares problem.
X k k Fro represents the Frobenius norm of the matrix X. JðαÞ is the objective function that needs to be minimized to obtain the mode amplitude α. In Appendix A, we present the correspondence between the DMD modes and the relaxation process.

Sparse estimation of DMD
We assume prior knowledge that the normal modes representing the lattice oscillations of a substance among modes extracted by DMD are few (sparse). We introduce SpDMD [20] to obtain the sparse solution of the mode amplitude. SpDMD is a combination of DMD and the least absolute shrinkage and selection operator (Lasso) [36], which automatically extracts a small number of components for the purpose of explaining data. We solve the following , 1regularized optimization problem.
Here, γ represents the weight of the , 1 regularization term. If γ takes a significant value, α becomes a more sparse solution. If SpDMD gives the regularization parameter γ, we obtain a unique mode set.

Mode selection by Bayesian LARS-OLS
We deal with mode selection in Bayesian inference by considering the prior of α. Although we optimized α with , 1 regularization as shown in above, the , 1 regularization develops a bias of the basis coefficient. In this section, we present a well-known technique for preventing the bias, called 'polishing' [37], within the framework of Bayesian inference. Such framework is known as Bayesian LARS-OLS [22] and we extend this framework to the mode selection problem in SpDMD. Bayesian LARS-OLS framework uses Bayesian free energy [23] to evaluate the goodness of model fit. We can determine the optimal sparsity γ and select the set of DMD modes based on the criteria. In Section 2.3.1, we first introduce the 'polishing' method. Next, in Section 2.3.2, we introduce the Bayesian LARS-OLS and Bayesian free energy (FE).

Polishing
We regress the mode amplitude α again in terms of the candidate modes selected by SpDMD to prevent the bias of the coefficient. The technique is called 'polishing' [36]. We deal with the 'polishing' in mathematics and introduce the prior of α condition in Bayesian inference by considering parameter cðγÞ. We define an indicator cðγÞ representing whether the modes is selected or not by SpDMD: where i is the index of the modes. In other words, the indicator cðγÞ is the set of indices of which DMD mode was estimated as nonzero. We solve the constrained optimization problem of Equation (17) using the set cðγÞ with suffix i depending on the , 1 regularization term γ in SpDMD.
This equation means that we minimize JðαÞ under the constraint α i ¼ 0, and α depends on γ only through c.

Evaluation of decomposed modes
In this section, we deal with the aforementioned 'polishing' method in a Bayesian inference. The 'polishing' has two different purposes: mode search by , 1 regularization and regression. Least angle regression and ordinary least squares (LARS-OLS) [37] is a framework of regression analysis that performs basis search and regression in a stepwise manner. Igarashi et al. considered the LARS-OLS as Bayesian inference (called Bayesian LARS-OLS) and proposed a method that achieves the basis search in the Bayesian free energy criterion [21,22]. We extend their method to be able to apply the material's dynamics such as a relaxation process. In Bayesian LARS-OLS framework, we consider DMD as a problem of linear regression and express it by a probability model. The model of DMD as described above can be written asŷ ¼ X Ã α where X Ã is a matrix determined from the eigenvalues μ and eigenvectors ψ of DMD. Please see Appendix B for the detail derivation. The optimization of , 2 loss (Equation 14) for above linear model corresponds to the maximum likelihood estimation under the assumption that the observation noise is the independent and identically distributed (i.i.d.) Gaussian. Overall, we can write the DMD estimator as follows: When we estimate the modes X and sparsity γ asX andγ, we define a combination of modes asĉ ¼ cðγÞ.
We note that cðÁÞ computesĉ through SpDMD by usingX andγ. To evaluateĉ, we consider a posterior distribution Pĉjy ð Þ. From the Bayesian theorem, the posterior is Here, PðyÞ is constant. We define the prior distribution PðĉÞ to be uniform and use the following relationship: Based on the above assumption (Equation 18), the following holds: Pðyjα; X;ĉÞ,N ðX Ã α; σ 2 Þ. We assume that the matrix X, which characterizes modes, does not depend on the combinationĉ, so we define P Xjĉ ð Þ ¼ δðXÞ: This equation means that the combination of modes does not change either the eigenvalue of the time evolution μ or the dynamic mode ϕ, where δ Á ð Þ is the Dirac delta function. Substituting Equation (21) into Equation (20), we obtain Here, the P αjX;ĉ À Á is a prior probability and the P yjα;X À Á is a distribution of data generation. The PðyjĉÞ, called the marginal likelihood, represents the likelihood of the indicatorĉ estimated from the data. The negative marginal log-likelihood, is often referred to as the Bayesian free energy (FE) [23]. The FE criterion evaluates the goodness of model fit. In our criteria, the indicator c , which means whether the DMD mode was selected or not, determines the models and FE. The indicator c with the smallest FE represents the best basis set. To calculate FE analytically, we define the prior distribution P αjX;ĉ À Á as follows in the cases of α i ¼ 0 and α i Þ0.
Pðα i jX; i 2ĉÞ ¼ CN ð0; σ 2 pri Þ (24) Here, the coefficient α is complex number, we define PðαjX;ĉÞ as a complex Gaussian distribution [38]. From the obtained FE ðĉÞ, we estimate the variance v ¼ σ 2 of the noise and the variance v pri ¼ σ 2 pri of the prior distribution as values that minimize the FE ðcÞ. By repeatedly solving the self-consistent equations until convergence, the values of v and v pri that minimize the FE ðĉÞ are obtained. In this way, we can analytically calculate the FE. The reason why we can analytically calculate the FE ðĉÞ is that the parameter X of the DMD's eigenvalues and eigenvectors has already been determined.

Results and discussion
We focus on experimental signals of a bismuth (Bi) thin film as typical examples of relaxation processes. We deposited a Bi thin film on a sapphire substrate with a thickness of 150 nm and used a reflective pump-probe spectroscopy method [16] to obtain the CP signals. The signal background varied from measurement to measurement due to experimental instabilities. We analyzed two typical signals. One signal has a background close to steady-state, as shown in Figure 1(b), and the other signal has a strong background, as shown in Figure 1(c). In Sections 3.1 and 3.2, we explain the advantage of our proposed method. Finally, in Section 3.3, by using virtual data, we describe the robustness to noise and the noise estimation performance.

Background close to steady-state
We analyzed the weak-background data in Figure 1(b). The experimental data had N = 2433 data points from t = 0.00 to 8.00 ps at an interval of Δt = 8/2433 ps. The data matrices Ψ 0 and Ψ 1 were generated according to Figure 2 with M = 1000. We prepared the weight of the sparsity term γ in Equation (15) as 600 units with equal intervals on a log scale between γ¼ 10 À9 and 10 À2 . Figure 3(a) shows the reconstruction of the original signal with the optimal modes selected by the previous method. As shown by the red and green lines in Figure 3(a), the previously proposed method successfully extracted the normal vibration modes A 1g and E g . Moreover, the other selected modes are denoted by the black dotted lines and correspond to a weak background in the experimental data [19]. By using the selected modes, the recovered signals coincide with the experimental data, as shown in the pink filled area and the blue line of Figure 3(a). The residual signal between the original signal and the reconstructed signal is close to zero, as shown by the solid bottom lines of Figure 3(a). In Murata et al. [19], assuming prior knowledge that the normal modes representing lattice oscillations of the substance of interest were sparse, the authors selected the modes extracted by DMD as explained in Section 2.1. The previous method determined the optimal number of modes to be nine, where the curve of the MSE curve crosses the estimated noise variance σ 2 est ¼ 4:00 Â 10 À14 , as shown in Figure 3(c). Note that the authors estimated the noise variance from the variance of the negative time signal [19]. On the other hand, the proposed method also succeeded in extracting the normal modes from the experimental data without using speculative noise information, as shown in Figure 3(b). In the proposed method, we estimated the noise variance in objective data using Bayesian inference and optimized the number of normal modes as 19 by minimizing the Bayesian FE, as shown in Figure 3(d). Our proposed method can obtain an estimate of the observed noise level from the Bayesian FE. We estimated that the standard deviation of the noise was σ FE ¼ 3:56 Â 10 À8 , while the value of the noise estimated in the previous study was σ est ¼ 2:00 Â 10 À7 .
To qualitatively investigate the selected normal modes, we compare the results of a frequency spectrum analysis of the selected normal modes with the results of a previous study [16]. Hase et al. [16] reported that the two normal modes of (111)-Bi, A 1g and E g , have frequencies of 2.91 and 2.07 THz, respectively. Figure 4 quantitatively shows the frequency of each mode, where the horizontal axis represents the frequency calculated from the eigenvalue of the extraction mode, and the vertical axis represents the absolute value of the mode amplitude. Figure 4(b,c) show the modes selected by the previous method [19] and the proposed method, respectively, where both methods assumed that the original signal consisted of a few modes. The figures show that both methods successfully extracted the normal modes at 2.90 and 2.09 THz. These modes correspond well to A 1g (2.91 THz) and E g (2.07 THz), respectively.

Strong background
In contrast to the previous section, the assumption regarding the noise estimation by Murata et al. [19] does not hold when the data have a strong background, which can cause the normal mode selection to fail. Let us consider the strong-background case. The experimental data had N = 9740 data points from t = 0.00 to 7.50 ps at an interval of Δt = 7.5/9740 ps. The data matrices Ψ 0 and Ψ 1 were generated according to Figure 2 with M ¼ 4970. We prepared the weight of the sparsity term γ in Equation (15) as 600 units with equal intervals on a log scale between γ¼ 10 À7 and 10 À2 . Figures 5 and 6 show the results for a strong background. The previous method determined seven modes as shown in Figure 5(c). Although the selected modes included the A 1g mode, they did not include the E g mode. On the other hand, our method selected 85 modes, as shown in Figure 5(d), which successfully included both the A 1g and E g modes. Our method also has an advantage in terms of the residuals, as shown by the solid black line in Figure 5(a,b). The residual signal extracted by our method is steady, whereas the previous method includes a vibration component in the residual signal.
As in Figure 4, we performed a frequency spectrum analysis of each mode to quantitatively evaluate the difference from the normal modes obtained in a previous study [16] and to ensure that normal modes can be extracted quantitatively ( Figure 6). As in Figure 4(a-c), Figure 6(a) shows the modes extracted from DMD, and Figure 6(b,c) show the modes selected by the previous method [19] and the proposed method, respectively. Figure 6(b) shows that the previous method successfully extracted the mode at 2.96 THz, which corresponds to the A 1g (2.91 THz) mode band; however, it failed to extract the E g (2.07 THz) mode band. On the other hand, Figure 6(c) shows that the proposed method successfully extracted the normal modes at 2:96, and 2.34 THz. These modes correspond to the A 1g (2:91 THz) and the E g (2:07 THz) modes bands, respectively. The estimated errors were 0:05 and 0:27 THz, respectively. This latter error seemed to be influenced by the intense fluctuating background components. To decompose such phonon modes, which have a weak amplitude and short decay time, we should improve the experimental conditions to reduce such experimental artifacts.
Let us consider the noise estimation results. We estimated that the standard deviation of the noise was σ FE ¼ 8:08 Â 10 À8 , while the value of the noise estimated in the previous research was σ est ¼ 5:33 Â 10 À7 . In both the strong-and weak-background data, σ est is approximately six times larger than σ FE . The error of the noise estimation has a critical effect on the normal mode   selection in the data with a strong background. The previous method failed to select the E g mode, because the noise level estimated by the previous method was larger than the noise level estimated by the proposed method. Because the amplitude of the E g mode was smaller than the A 1g mode, it was considered that the E g mode deviated from the selected mode. However, since our method estimates noise without a stationary assumption for an experimental artifact, we succeeded in selecting normal modes with small amplitudes. Our method is effective, especially for data with a strong background. By applying our method to CP analysis, we can select normal modes from measurement signals without considering the experimental situation. Since the normal modes were known for (111)-Bi, it was possible to find normal modes from candidates of selected modes obtained by mode selection.

Virtual CP data
Finally, we investigate the performance of the previous method and the proposed method against virtual CP data to evaluate the effect of noise. Let us consider the results of the virtual CP data. We created virtual data by injecting Gaussian noise (σ,8:0 Â 10 À8 ) into the reconstructed signal from nine modes chosen by the previous method in the weak-background (Figure 3(a)). We prepared 100 virtual datasets with different samples of noise. Figure 7 shows the number of selected modes. We plotted the distributions of counts as a box plot. The previous method selected 191 AE 114 modes, while our method selected 14 AE 11 modes. We note that the number of modes selected by proposed method was always greater than the correct number of modes of nine. Since there is no negative time domain in the generated virtual data, the noise level used to determine the mode in the previous method was the generated noise level (σ,8:0 Â 10 À8 ). In the result of the previous method, the selected numbers varied considerably according to the noise, and the mode number tended to be much larger than the correct number of modes. On the other hand, our method has minimal dispersion and can select a number that is closer to the correct mode number from noisy data. These results show that our method is much more robust to noise than the previous method.
Let us consider the noise estimation results. The estimated noise level by the FE was σ,9:4 AE 1:5 Â 10 À8 . Since the true noise level is σ,8:0 Â 10 À8 , our method can estimate the amplitude of the original noise from an observed signal whose noise value is unknown. This result indicates that it is possible to quantitatively evaluate the amplitude of the noise without using a heuristic technique.

Conclusions
In this study, we extended Bayesian LARS-OLS to be used in SpDMD for the selection of normal modes in a relaxation process. We numerically compared the performance of our proposed method to the previous method proposed by Murata et al. [19]. The following three results showed the effectiveness of the proposed method based on data of the CP signal of a (111)-Bi thin film. First, our method succeeded in extracting a small number of modes, including the normal modes, from experimental data even with significant background trends. Then, we used virtual data to examine the effect of noise. The selected set of modes was more robust to observation noise than the conventional method. Finally, our proposed method allowed to estimate not only normal modes but also the level of observation noise quantitatively. From these observations, our proposed method is applicable to normal mode analysis of relaxation processes, especially for data with strong backgrounds, which broadens the applicability of data-driven approach in analyzing of relaxation phenomena in material science.

Disclosure statement
No potential conflict of interest was reported by the authors.
Here, χ; χ 0 is a column vector which is a pair of state vectors ψ of DMD introduced for the reconstruction of the original time series and can be expressed using and α ¼ α 1 ; α 2 ; :::; α r ð Þ T as Therefore, it becomes χ 0 T ¼ V T and T 0 α; χ 0 M T ¼ V 0 T and T M α. The next thing to do is to display matrix with y ¼ X Ã α. By using (B6) ψ 0 corresponds to Φα 2 C M , Ã ψ N corresponds to ΦD NÀMþ1 μ α 2 C M . Eventually, it becomes Here, we note