Spectrum adapted expectation-conditional maximization algorithm for extending high–throughput peak separation method in XPS analysis

ABSTRACT We introduced the spectrum-adapted expectation-conditional maximization (ECM) algorithm to improve the efficiency of the peak fitting of spectral data by various fitting models. The spectrum-adapted ECM algorithm can perform the peak fitting by using the Pseudo–Voigt mixture model and Doniach–Šunjić–Gauss mixture model which are generally used for the peak fitting in the X-ray photoelectron spectroscopy. Analyses of the synthetic and experimental spectral data showed that the proposed method quickly completed the calculation and estimated well-fitted curves to spectral data. This result suggests that the spectrum adapted ECM algorithm efficiently perform the peak fitting for large number of spectral data sets. Graphical Abstract


Introduction
Analysis of spectral data is an essential part in the spectroscopy measurement for investigating electronic properties of materials and devices [1][2][3][4]. As the number of obtained spectrum data has been increasing by the improvement of the spectroscopy device, it is a serious problem in the efficiency of the spectroscopy measurement to analyze a large number of spectral data set. In the case of X-ray photoelectron spectroscopy (XPS), for example, peak assignment of corelevel spectra in compounds strongly resorts to the manual trial and error, because suitable parameters of the fitting curve are explored according to previous reports and researchers' experiences. As such procedure surely affects the efficiency of the analysis, the conventional analysis procedure is a bottleneck in the cycle of new material and device development. Therefore, the development of the analysis method is indispensable for improving the time-consuming procedure of peak fitting and for accelerating the cycle.
Recent works have proposed an analysis procedure that employs techniques developed in the field of informatics. For example, Nagata et al. [5] introduced a spectral deconvolution method based on Bayesian estimation with the exchange Monte Carlo method and demonstrated the estimation of an appropriate number of peaks while avoiding parameter solutions trapped into local minima. Murata et al. [6] also applied the Nagata's framework to the time-series spectral data. Shiga et al. [7] proposed a new non-negative matrix factorization (NMF) technique to analyze spectral imaging data. This technique has support to resolve problems associated with previous NMF techniques, such as the calculation not always converging, and the number of separated peaks being specified by the manual trial and error. Very recently, Akai et al. [8] have proposed Bayesian spectroscopy for spectral analyses of optical spectra and for normal mode analyses of coherent phonon signals in solid-state materials. This Bayesian spectroscopy demonstrated high accuracy estimation of material parameters than the leastsquare and Fourier transform methods. Shinotsuka et al. [9] also have developed a peak fitting procedure based on the Bayesian information criterion to estimate the model parameter and appropriate number of peaks in the spectral decomposition. The methods for preprocessing process to the background subtraction have been developed by introducing some hyperparameters into the model [10][11][12], and these methods were demonstrated to some spectral data, such as infrared, Raman spectra, chromatography and nuclear magnetic resonance. Very recently, Ozaki et al. [13] successfully demonstrated the blackbox optimization approach [14] to the Rietveld method in order to automate the trialand-error process for searching the hyperparameters. Moreover, a machine learning technique for material structural data have been actively studied in the classification of crystal structure by using the deep neural network e.g [15,16].
For the high-throughput analysis of spectral data, Matsumura et al. [17] proposed the spectrum-adapted EM algorithm that focuses on performing rapid and auto peak fitting. In the spectrum adaptive EM algorithm, the peak fitting is conducted by using the intensity of spectrum as a weight for each measurement energy step. This method enables the reduction of the computational cost for optimizing the parameters of the fitting model. However, the spectrum adapted EM algorithm can only be applied to Gaussian mixture model (GMM), it is thus difficult to be applied in the analysis of various spectral data. In order to perform the highthroughput analysis method for various spectral data, it is necessary to develop a method that can extend the available fitting models such as, Pseudo-Voigt distribution and Doniach-Šunjić distribution.
In this study, we extended the spectrum-adapted EM algorithm so that it can use the Pseudo-Voigt mixture model (PVMM) and Doniach-Šunjić-Gauss mixture model (DSGMM) as the fitting model in the peak separation. The extended method is derived by applying the expectation-conditional maximization (ECM) algorithm [18]. The ECM algorithm is one of the generalized EM algorithms with the iterative calculation between the expectation (E) step and conditional maximization (CM) step. We demonstrated the application of the proposed method to the synthetic and experimental spectral data from XPS analysis.

The fitting models
We introduced the PVMM and DSGMM as the fitting models for the spectrum data. Each fitting model for measurement energy steps (x n ) is constructed of the Pseudo-Voigt distribution (Pseudo Voigtðx n jμ; σ; γ; ηÞ) [19] and the Doniach-Šunjić distribution (Doniach � Sunji� c x n jμ; γ; α ð Þ) [20] as follows: and where Lorentz x n jμ; and Normal x n jμ; σ ð Þ ¼ 1 ffi ffi ffi ffi ffiffi 2π p σ exp À 1 2 Here, μ is the location parameter specifying the peak of the spectrum, γ(γ > 0) is the scale parameter specifying the half-width at half-maximum (HWHM), σ is standard deviation (σ > 0), η is mixing parameter (0 � η � 1) between Lorentz x n jμ; γ ð Þ and Normal x n jμ; σ ð Þ, and α is the degree of the asymmetry of peak shape (0 � α � 1). The Pseudo-Voigt distribution is generally used as a fitting model to spectral data because it is empirically considered to fit well e.g [21]. Also, some fitting models based on the Pseudo-Voigt distribution have been proposed e.g [22,23]. The Doniach-Šunjić distribution is a peak fitting model used for the spectral data with asymmetric peak shapes. Additionally, spectral data is generally considered to have a gaussian broadening, namely the energy resolution depends on the temperature of the sample, the performance of photoelectron analyzers, spectrometers, light sources, and so on. Thus, we define the Doniach-Šunjić-Gauss distribution by using mixing parameter (η, 0 � η � 1) to approximate such line shape by combining a Doniach-Šunjić distribution with a Gaussian distribution as follows: Doniach � Sunji� c Gauss x n jμ; σ; γ; α; η ð Þ Such line shape function was introduced by Joyce et al. [24], and it seems to be suitable for practical uses in the spectral data such as C 1s spectra e.g [25]. Here, the scale parameter, γ, specifies the half-width at halfmaximum (HWHM) of the distribution. Also, the HWHM can be represented by using of Gaussian distribution to be ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ln2 p σ. Hence, as the relationship between σ and γ is represented as follows γ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ln2 p σ, in a formalism of Pseudo-Voigt distribution, γ k can be represented by using σ k as γ k ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ln2 p σ k . Using this relation, Equations (1) and (2) are rewritten as follows: Pseudo Voigt x n jμ; σ; η ð Þ ¼ η Lorentz x n jμ; ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ln2 and Doniach � Sunji� c Gauss x n jμ; σ; α; η ð Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ln2 respectively. As the number of parameters is reduced, we use Equations (6) and (7) to be the component of PVMM and DSGMM, respectively.
In the peak fitting, as these distributions have a broad tail and the observed measurement energy steps (x n ) are finite, each distribution (Equations 6 and 7) is expressed in a truncated distribution in the range of x n as follows: Pseudo Voigt x n jμ; σ; η ð Þ and respectively. The PVMM and DSGMM are, respectively, defined as a linear superposition of k-th normalized Pseudo-Voigt distribution (P PV x n jμ k ; σ k ; η k À � ) and normalized Doniach-Šunjić-Gauss distribution (P DSG x n jμ k ; σ k ; α k ; η k À � ) as follows: and DSGMM x n jλ k ; μ k ; σ k ; α k ; η k À � ¼ X K k¼1 λ k P DSG x n jμ k ; σ k ; α k ; η k À � ; (11) where, K is the number of components (K > 0), and λ k is the mixing coefficient of k-th component of the peaks (0≤λ k ≤1 and P K k¼1 λ k ¼ 1). PVMM becomes GMM and Lorentz mixture model (LMM) in η k = 0 and 1, respectively. Moreover, DSGMM becomes GMM and Doniach-Šunjić mixture model (DSMM) in η k = 0 and 1, respectively. In the peak fitting based on the spectrum adapted EM algorithm, it is difficult to analytically derive the updating rule of the parameters in the M-step for PVMM and DSGMM. This difficulty also occurs in the case of GMM at η k = 0 in PVMM and DSGMM because we use a truncated distribution to be the component of mixture model. In this study, we introduce the optimization procedure based on the spectrum adapted ECM algorithm to optimize the parameters of the fitting models (Equations 10 and 11)

Spectrum-adapted ECM algorithms
The ECM algorithm is one of the generalized EM algorithms based on maximum likelihood estimation with the iterative calculation between the expectation (E)-step, and conditional maximization (CM)-step, which conducts several conditional optimizations within each CM-step [18]. When the parameters are separated into some subsets and the maximization (M)-step is divided into multiple CM-steps, each CMstep conducts the optimization to a subset of the parameters with the other subsets are fixed [26]. This procedure addresses the problem of an intractable calculation in the M-step. It is difficult to conduct simultaneous optimization of parameters at M-step in the spectrum-adapted EM algorithm for general fitting models such PVMM and DSGMM. This difficulty is improved by individually optimizing the parameters by introducing CM-steps instead of M-step. We apply the ECM algorithm to the peak fitting by estimating the parameters of the mixture model. In order to apply the ECM algorithm, we consider the intensity of the spectral data as a weight for each measurement step of energy [17]. When the spectral data consist of N measurement steps of energy ( , the spectral data (Dat) are represented in two dimensions: As conventional EM algorithm requires to convert Dat to be one-dimensional to apply the peaks fitting, the size of data becomes the sum of the intensity. This size of the converted data is significantly larger than the total number of measurement steps (N), and the calculation cost greatly increases. Spectrum adapted EM algorithm solved this disadvantage by using the intensity (w) as a weight for each measurement step (x) [17]. We apply this framework to the ECM algorithm for the peak fitting by PVMM and DSGMM. The flowcharts of spectrum-adapted ECM algorithm for PVMM and DSGMM were represented in Figure 1.

Spectrum-adapted ECM algorithm in PVMM
At first of the calculation, the parameters of the PVMM (λ k , σ k and η k ) are initialized, and the initial value of weighted log-likelihood is calculated as follows.
In the E-step, the responsibilities (γ z nk ) are calculated by using current parameter values (λ old k , μ old k , σ old k , and η old k ) as follows.
where γ z nk corresponds to posterior probabilities when the measurement steps (x) are observed (Bishop 2006). Moreover, z nk is a latent variable associated with x n . In PVMM, x n is assumed to be generated from one of the Pseudo-Voigt components. z nk represents the component that generated x n ; i.e. z nk is equal to be 1 when x n is generated from k-th component otherwise z nk is equal to be 0. Theoretical details are described by Bishop [26] and McLachlan and Krishnan [27].
Then, in the CM-step for λ, the parameters of λ k are updated from λ old k to λ new k by using current responsibilities (γ z nk ) and intensities (w) that correspond to the measurement steps (x) as follows: where In CM-step for μ, the parameters of μ k are updated from μ old k to μ new k . μ new k are obtained by the maximization of the expectation of the complete-data loglikelihood with the weight of the intensity as follows Here, λ k , σ k and η k are fixed to be λ new k , σ old k and η old k , respectively. This maximization is conducted by using the Brent's method [28] in this study. In CM-step for σ, the parameters of σ k are updated from σ old k to σ new k by maximizing the Equation (17)  the update and that immediately before the update is more than 1 × 10 −8 , the calculation is returned to the E-step. Otherwise, the calculation is determined to be converged and the parameters at that time are adopted to be the solution.

Spectrum-adapted ECM algorithm in DSGMM
The parameters of DSGMM (λ k , μ k , σ k , α k and η k ) are estimated by the same procedure as that for PVMM. At first, the initial value of weighted log-likelihood is calculated by using initialized parameters as follows.
In the E-step, γ z nk are calculated by using current parameter values (λ old k , μ old k , σ old k , α old k and η old k ) as follows.
In DSGMM, x n is assumed to be generated from one of the Doniach-Šunjić-Gauss components. z nk represents the component that generated x n . Then, in CM-step for λ, the parameters of λ k are updated from λ old k to λ new k by using current responsibilities and intensities (w) that correspond to the measurement steps of energy (x) as follows: where In CM-step for μ, the parameter of μ k are updated from μ old k to μ new k . μ new k are obtained by maximizing the following expectation of the complete-data loglikelihood with the weight of the intensity: Here, λ k , σ k , α k and η k are fixed to be λ new k , σ old k , α old k and η old k , respectively. In CM-step for σ, the parameter of σ k are updated from σ old k to σ new k by maximizing the Equation (22) in which λ k , μ k , α k and η k are fixed to be λ new k , μ new k , α old k and η old k , respectively. In CM-step for α, the parameter of α k are updated from α old k to α new k by maximizing the Equation (22) in which λ k , μ k , σ k and η k are fixed to be λ new k , μ new k , σ new k and η old k , respectively. In CM-step for η, the parameter of η k are updated from η old k to η new k by maximizing the Equation (22) in which λ k , μ k , σ k and α k are fixed to be λ new k , μ new k , σ new k and α new k respectively. When the distance of the weighted log-likelihood values (Equation 18) after the update and that immediately before the CM-steps is less than 1 × 10 −8 , the calculation is determined to be converged and the parameters at that time are adopted to be the solution. Otherwise, the calculation is returned to the E-step.

Generation of the synthetic spectral data
We demonstrate the peak fitting by spectrum-adapted ECM algorithm to the synthetic spectral data set. The synthetic spectral data are generated from threecomponent PVMM and DSGMM, respectively. The data is represented as follows and where w spv ð Þ n and w sdsg ð Þ n are the intensity of the synthetic spectral data corresponding to x n and spv and sdgs mean the synthetic Pseudo-Voigt distribution and synthetic Doniach-Šunjić-Gauss distribution, respectively. In the XPS, as the intensity is obtained from the count data of the number of photoelectrons at each measurement energy step, we assume that w spv ð Þ n and w sdsg ð Þ n are randomly generated from the Poisson distribution (Poisson Xj� ð Þ) as follows where X is a discrete random variable that is natural numbers starting from 0 (0, 1, 2, . . .). � is the parameter representing the mean of the distribution. We defined  (26) and (27). Moreover, the noise-free synthetic data (w 0  (Tables 1 and 2). Calculation was conducted by using our own source code developed in R (http://cran.r-project.org/) [29]. R is an open-source programing language and software environment for the statistical analysis. Also the computer carrying out the calculations had an Intel(R) Core(TM) i7 CPU with four cores at 2.9 GHz with 16 GB memory. Figure 2 shows the fitting curves of PVMM to the synthetic data. The calculation time required about 4 seconds to satisfy the convergence criterion in each synthetic data ( Figure 2). In this analysis, over 500 iterations were required to reach the convergence criterion ( Figure S1). The parameters of fitting curve at d = {4, 5} and noise-free were estimated similar values to the true parameters (Table  3). In contrast, the synthetic data at d = 3, values of estimated η were deviated from the true value, although μ and λ were close to the true values. The parameters significantly fluctuated at the early stage during iterative calculation, and then the values of μ, σ and λ hardly changed when iterative calculation was reached about 100 iterations, whereas values of η were still unstable at those iterations ( Figures S2-S5). Thus, values of η required more iterative calculations than other parameters to be stable. Figure 3 shows the fitting curves of DSGMM to the synthetic data. The fitting curve of DSGMM successfully estimated the asymmetric shape of the synthetic data. In this analysis, the calculation time required less than 9 seconds to satisfy the convergence criterion, and over 400 iterations were required to reach the convergence criterion ( Figure S6). Each parameter of the DSGMM fluctuated at the beginning of the iterative calculation, and then these values of μ, σ and λ hardly changed when iterative calculation is reached about 100-200 iterations, whereas values of α and η were unstable over 200 iterations (Figures S7-S10). Thus, the values of η and α required more iterative calculations than other parameters to be stable. Table 4 shows that the values of μ, σ and λ were estimated to be close values to the true values in each synthetic data. In contrast, the values of α and η were deviated from the true values at d = {3, 4}. This result suggests that the accuracy in the estimation of α and η were affected by noise level.

Result of the synthetic data analysis
In order to compare the proposed method with the conventional peak fitting procedure, we performed synthetic data analysis by an adaptive non-linear least-square algorithm based on the least-squares method [30,31]. In the analysis, synthetic data (d = 4) and initial values were generated by the same procedure in the analysis by the proposed method. The peak fitting by the conventional method was terminated without estimating an appropriate parameter solution at various initial values because the loss function diverges to infinity. Thus, as the conventional methods became unstable depending on the initial value, it is necessary to search for the appropriate initial values by trial-and-error in the analysis. Such trial-and-error is unsuitable in the high-throughput analysis of the large amount of spectral data. When the calculations were converged in conventional method, the parameter solution and calculation time were summarized in Table S1-S4 for proposed method and conventional method. The parameter values estimated by both methods did not show a significant difference. In contrast, the calculations by the conventional method were completed 5-18 and 11-24 times faster than the proposed method in PVMM and DSGMM, respectively. The differences in the calculation time would be because the proposed method requires more calculation steps than conventional method to conduct the conditional optimization of each parameter in CM-step.

Collection of the experimental data
Proposed method was applied to the W 4 f core-level spectrum and the C 1s core-level photoelectron spectrum. The W 4 f core-level spectrum was taken   by a mechanically exfoliated WSe 2 sheet in a 2D heterostructure tunnel field effect transistor [32]. Also, the C 1s core-level photoelectron spectrum was taken by a graphene sheet on a 90 nm SiO 2 /p + -Si(100) substrate [33]. Both spectral data were acquired using a VG Scienta R3000 spectrometer in the scanning photoelectron microscopy system (3D nano-ESCA) installed at the University of Tokyo outstation beamline, BL07LSU at SPring-8. The energy resolution of the spectrometer was set to 300 meV. The photon energy of the synchrotron radiation beam used for measurements was 1000 eV. For convenience, these spectral data are called TFET (tunnel field effect transistor)-data and GFET (graphene field effect transistor)-data, respectively. The background of TFET-data and GFETdata were removed as a linear and Shirley background, respectively. In the calculation, TFET-data and GFET-data were analyzed by PVMM and DSGMM, respectively. For TFET-data, the number of peaks were K = 3, and the initial values of the parameters (μ, σ, η and λ) were randomly collected from the range [29.35:43.35 tively. Also, the initial values of η 1 and η 1 were set 0 and 0.5, respectively and η 1 was fixed to be 0. Figure 4 shows the result of the peak fitting by using PVMM and DSGMM, and Tables 5 and 6 summarizes the initial and estimated values of the parameters. The fitting curves of PVMM and DSGMM were well-fitted to the spectral data. Calculation time of the peak fitting were 1.47 and 11.27 sec in TFET-data and GFET-data, respectively. Also, the values of L PVMM and L DSGMM were monotonically increased during the iterative calculation, and about 140 and 1000 iterations were required to satisfy the convergence criterion in the analysis of TFETdata and GFET-data, respectively ( Figure S11). The parameters of PVMM and DSGMM fluctuated at the beginning of the iterative calculation, and then these values were stable after 60 and 500 iterations, respectively ( Figures S12 and S13). The peaks (μ 1 , μ 2 and μ 3 ) estimated from the TFET-data were 37.82 eV, 34.89 eV and 32.74 eV, respectively (Table 5). These peaks represent the contaminant, W 4f 5/2 and W 4f 7/2 component, respectively. As the mixture ratio (λ 1 , λ 2 and λ 3 ) were 0.16, 0.37 and 0.48, respectively, the mixture ratio between W 4f 5/2 and W 4f 7/2 component were 1:1.3. In the GFET-data, the peaks (μ 1 and μ 2 ) were 285.23 eV and 284.28 eV, respectively ( Table 6). These peaks represent the contaminant, and graphene sp 2 component, respectively. The component of contamination would arise lithographic processing [33,34]. The peak of graphene sp 2 component (284.28 eV) was very close to the value reported in Nagamura et al. [33] (284.23 eV). As the mixture ratio (λ 1 and λ 2 ) were 0.63 and 0.38, respectively, the contaminant component occupies larger part of the data than that of the graphene sp 2 component. Such relatively large contaminant component was also observed in Nagamura et at al [33]. The degree of the asymmetry of the graphene sp 2 component (α 1 ) was 0.29, whereas the asymmetric parameter of the graphene C 1s peak is generally assumed to be about 0.1 [35][36][37]. Thus, the value of α 1 would be overestimated due to the influence of noise and contamination components in this analysis.

Discussion and implication
The spectrum-adapted ECM algorithm was introduced to extend the spectrum-adapted EM algorithm [17] and was successfully applied to synthetic and experimental data analyses. The advantage of the peak fitting method based on the EM algorithm is its stable and fast calculation [17]. As the parameters converged to a local optimal solution by monotonically increasing log-likelihood value in the iterative calculation ( Figures S1, S6 and S11) and calculations were completed less than 12 sec regardless of the spectral data (Figures 2-4), the proposed method can stably and quickly conduct the peak fitting without the manual trial and error to obtain the parameter solution. As the proposed method requires non-negative intensity values to run the calculation appropriately, it can perform peak fitting whenever the spectral data satisfies such conditions regardless of the experimental method. Therefore, the application of this proposed method is not limited to XPS. The spectrum-adapted ECM algorithm enables the conduction of high-throughput analysis by using various fitting models. The synthetic and experimental data analyses showed that the spectrum-adapted ECM algorithm estimated well-fitted curves using PVMM and DSGMM, both of which are general fitting models in XPS. As PVMM and DSGMM involve the component of Gauss and Lorentz distribution, and Doniach-Šunjić and Gauss distribution, respectively, the fitting curves can fit the spectral data more flexibly than the Gaussian mixture model in the spectrum adapted EM algorithm [17]. As the experimental data analysis suggests that the calculation can be completed in several to several tens of seconds for the fitting of 2-3 peaks, the calculation cost is sufficiently low in practical application. Moreover, as the method based on the EM algorithm has an initial value dependence [17], computational cost and accuracy of the analysis can be improved by appropriate initial values of the parameters instead of randomly generated initial values. An appropriate procedure of the initial value generation may improve the calculation cost and accuracy in the analysis of large number of spectral data sets. In   contrast, the conventional method by the adaptive non-linear least-square algorithm is unsuitable for the high-throughput analysis because the behaviour of calculation can be unstable depending on the initial values. As the ECM algorithm promises the convergence to the local optimum solution in the iterative calculations, such a behaviour does not occur. Therefore, the proposed method is robust in the peak fitting of large amounts of spectral data.
The proposed method has a limitation in the analysis of spectral data including large noise because the accuracy of estimation in some parameters, such as α and η, may be affected by the noise. Moreover, the number of components (K) and the distribution that consists the fitting model need to be selected before the calculation. In contrast, Bayesian spectrum deconvolution with the exchange Monte Carlo method performs highly accurate analysis and an appropriate fitting model is selected by calculating the marginal likelihood value [5]. However, this method requires high computational cost, and it is not easy to set a suitable prior distribution and inverse temperature by a non-expert. As such a setting is unnecessary in the proposed method, the peak fitting can be performed easily in the high-throughput analysis. Thus, these methods need to be selectively applied depending on the task.
Further development of the peak fitting method requires to develop an automated pre-processing method for the background subtraction. In the case of XPS, the peak assignment requires the proper subtraction of the background component in the preprocessing. This procedure is strongly relies on the manual trial and error and is inefficient when analyzing large amounts of spectral data sets. As the proposed method in this study extends the available fitting model, the background subtraction process will be automated in further extension of the method by defining a model that represents the background. Moreover, the blackbox optimization approach has been demonstrated in the selection of appropriate background function from some candidates during the optimization loop [13]. We consider that such optimization technique is useful for an automatic background subtraction. It will be possible to develop an advanced high-throughput peak fitting method by combing the proposed method and the BBO approach. Such advanced method will enable us to conduct the optimization of the fitting and background model automatically and quickly in the high throughput spectral data analysis.

Conclusions
Spectrum adapted ECM algorithm was proposed to use various fitting models, such as PVMM and DSGMM, in the high-throughput peak fitting. The synthetic and experimental data analyses suggest that the proposed method is applicable to peak fitting problems for the investigation of electronic properties of materials and devices at a sufficiently small computational cost. The advantage of the proposed method is that stable and quick peak fitting can be conducted to obtain the parameter solution without the manual trial and error. This advantage enables the highthroughput analysis of large amount of spectral data to be performed by using various fitting models.