Bayesian estimation for XPS spectral analysis at multiple core levels

ABSTRACT X-ray photoelectron spectroscopy (XPS) is a widely used measurement technique in material surface analysis, but its analysis is subject to operator arbitrariness in the results. In a previous paper, a method based on genetic algorithms was proposed to estimate the composition ratios of compounds from XPS data using reference spectra and it was shown that it is possible to analyze them automatically from the reference spectra data. In this paper, we newly proposed a Bayesian spectral decomposition method based on the exchange Monte Carlo method and tested it on artificial data. This method provides a posterior distribution of the model parameters. This not only allows the estimation of compositional ratios for samples, but also allows statistical reliability assessment. In addition, we simulated an artificial data analysis to clarify the effect on the identification of compounds and the estimation of their compositional ratios by varying the signal-to-noise ratio of the data. Graphical abstract


Introduction
X-ray photoelectron spectroscopy (XPS) is a technique to measure the kinetic energy distribution of photoelectrons emitted by X-ray irradiation and to obtain information on the type, amount and chemical bonding state of elements present on the surface of a sample, which is used in various surface analyses of materials. In the field of spectroscopy, reference spectrum databases are being rapidly constructed, and it is necessary to develop automated spectrum analysis methods using reference spectrum data. Previous studies developed a method to estimate parameters such as composition ratios, peak position adjustments and peak width adjustments of samples using genetic algorithms [1]. Additionally, this method makes it possible to identify the combinations of compounds in samples using Bayesian information criterion. The genetic algorithms, however, do not allow us to make statistical reliability assessments about the results of the estimation.
This paper proposes a Bayesian inference using the exchange Monte Carlo (MC) method for identifying kinds of compounds and estimating their composition ratios from XPS data of samples. In the Bayesian framework, it is possible to obtain the probability distribution of model parameters given the data, which allows us to estimate the reliability of the results. Nagata et al. estimated the position, width and intensity of the spectrum of each compound in a sample using the exchange MC method with a Gaussian distribution as the basis function [2]. On the other hand, considering the distribution represented by a pseudo-Voigt function to fit XPS data [3], we propose a framework to estimate the model parameters including the composition ratio by the exchange MC method using the reference spectrum of each compound given in advance. The compound identification is statistically equivalent to the framework for estimating the number of peaks and can be solved by the model selection framework [2]. However, it is not realistic from a computational viewpoint to evaluate the Bayesian free energy for all combinations of compounds. Therefore, we introduce an indicator parameter that expresses the presence or absence of each compound and propose a framework for compound identification by considering the indicator parameter as a model parameter and obtaining its posterior distribution.
To verify the effectiveness of the proposed method, we conducted Bayesian estimation using artificial data. In particular, taking advantage of the fact that the measurement time in XPS is strongly related to the signal-to-noise ratio of the data, we generated data with various signal-to-noise ratios by introducing the measurement time in the data generation. For these data, the statistical reliabilities of compound identification and composition ratio estimation are evaluated in the framework of Bayesian inference, and the relationship between measurement time and estimation accuracy is clarified.

Composition ratio estimation
In this section, we will describe the model for estimating the composition ratio and other parameters of observed samples from XPS data. In XPS analysis, the orbitals have a one-to-one relationship with the energy range. The spectrum f l ðxÞ of the lth orbital is expressed as where Θ l ¼ θ S l ; θ B l f g is the set of parameters on orbital l, which is composed of the signal parameter set θ S l and the background parameter set θ B l . The signal spectrum function Sðx; θ S Þ and the background spectrum function Bðx; θ S l ; θ B l Þ are provided as Vðx; μ; w; rÞ ¼ ð1 À rÞ ffi ffi ffi ffi ffi ffi ffi ln 2 p ffi ffi ffi π p w exp À ln 2 where h l is the intensity parameter and K is the number of compounds. The parameter c k is the composition ratio of the kth compound in the sample and satisfies P k c k ¼ 1. The function R lk ðx;μ k ;ŵ lk Þ is the reference spectrum of the k compound in the lth orbital, which plays the role of a basis function. μ k is the adjustment parameter for the peak position and ŵ lk is the adjustment parameter for the peak width. Note that μ k does not depend on the lth orbital. This is modeled by considering the effect of charge-up, which is the main cause of the misalignment of peak positions in XPS and assumes that the misalignment does not depend on the orbital and is uniform in the whole energy region. The pseudo-Voigt function Vðx; μ; w; rÞ is used for a single peak [3]. This function is defined as the sum of a Gaussian function and a Lorentz function. The parameter A lk is the probability of the atoms of the kth compound in the lth orbital and satisfies P l A lk ¼ 1. It can be interpreted as the probability of obtaining the reference spectrum of the target atoms from the kth compound. The paramater M lk is the number of peaks obtained from the kth compound in the lth orbital. The parameter m lkj is the area ratio of each peak and the parameter μ lkj is its position. The parameter w lkj is the peak width and the parameter r lkj is the value that determines the ratio of the Gaussian and Lorentz functions. The background B is modeled on the basis of the Shirley method [4][5][6]. The parameters a l and b l determine the edge intensity of the background, where a l > b l . The set of parameters representing the reference spectrum is assumed to be given beforehand. Therefore, the set of parameters Θ l of the lth orbital necessary to estimate the spectrum f l ðxÞ ¼ f ðx; Θ l Þ is defined as, Since the XPS observation data are count data, the conditional probability of intensity y given energy x in the lth orbital and the parameter set Θ l follows a Poisson distribution: We assume that each data point is i.i.d., then the conditional probability given a dataset of observations fX; Yg ¼ ffx li ; y li g N l i¼1 g L l¼1 for L orbitals is for the set of parameters Θ ¼ fΘ l g L l¼1 . Also the cost function EðΘÞ is From Bayes' theorem, the posterior distribution of the parameter set Θ is provided as where φðΘÞ is the prior distribution corresponding to pðΘÞ. pðYÞ is a normalized constant determined by the dataset and does not depend on Θ because Equation (14) is a probability distribution.
From the posterior distribution obtained above, the parameters are estimated.
By using the posterior probability distribution of Equation (18), it is possible to estimate not only each parameter but also its confidence interval. In Experiment 1, we will clarify the estimation accuracy and confidence intervals of various parameters by calculating the posterior probability distribution for artificial data. Note that one of the features of this model is that it is not sufficient to perform the estimation for each orbital but it is necessary to consider all the orbitals. If each parameter set Θ l of the lth orbital is independent of the other, the above cost function EðΘÞ can be expressed as the sum of the cost functions E l ðΘ l Þ for the lth orbital. In this case, the posterior probability distribution should also be discussed for each orbital. However, in the model in this paper, all the orbitals must be treated as a simultaneous probability distribution because c k and μ k exist in the parameter set Θ l as parameters independent of the lth orbital.

Compound identification
Depending on the combination of compounds used for estimation, the estimation accuracy of the model in the previous section may deteriorate due to the contribution of compounds that are not included in the sample. To avoid this problem, it is necessary to find the true combination from various combinations of compounds. Therefore, we propose an extended model for model selection using the framework of Bayesian estimation in the previous section. We introduce a new parameter g 2 f0; 1g into Equation (3) and consider the following model.
g k ¼ 1 denotes that the compound k is included in the spectrum data and g k ¼ 0 denotes that it is not included in the data. It is possible not only to estimate the composition ratio of each compound but also to identify whether it is present in the sample by simultaneously estimating the indicator parameters g ¼ ðg 1 ; . . . g K Þ. When identifying combinations of compounds, it is difficult to predetermine the required set of composition ratios in advance. Therefore, in this model, instead of c k , we introduce c 0 k , which eliminates the constraint P k c 0 k ¼ 1. Since the estimated c 0 k indicates the relative amounts of the compounds, the composition ratios for the identified combination of compounds can be calculated by correcting c 0 k so that P k g k c 0 k ¼ 1, based on the obtained g k and c 0 k . Applying Bayes' theorem, as in the previous section, we can derive the following equation as the joint posterior probability pðg; ΘjYÞ of indicator g and parameter set Θ: pðg; ΘjYÞ / φðgÞφðΘÞ expðÀ Eðg; ΘÞÞ; (20) Eðg; ΘÞ ¼ À f ðx li ; g; Θ l Þ ¼ Sðx; g; θ S l Þ þ Bðx; g; θ S l ; θ B l Þ: (22) Using the above joint posterior distribution, we first estimate the indicator g from the following marginal posterior distribution: ¼ φðgÞ ð dΘφðΘÞ expðÀ Eðg; ΘÞÞ: (24) Using the integration on the right-hand side, we define the Bayesian free energy as [7,8] FðgÞ ¼ À log ð dΘφðΘÞ expðÀ Eðg; ΘÞÞ: (25) If the prior φðgÞ is uniform, maximization of the marginal posterior distribution is equivalent to minimization of the Bayesian free energy [9]. This framework makes it possible to identify combinations of compounds. After the indicator is estimated, other parameters are estimated by using the posterior pðΘjg; YÞ over the parameter set Θ.
In Experiment 2, we perform simulations on this model and discuss the estimation accuracy for compound identification and the estimation accuracy of the parameter set Θ.

Sampling algorithm
In Experiment 1, we sample parameters from Equation (18) and performe maximum a posteriori (MAP) estimation. We use the MC method as a sampling method. However, since the MC method can lead to local minima depending on the initial values, we prepare the following replicas [10]: where β is an auxiliary variable called the inverse temperature and we can see that qðΘ; β ¼ 1Þ ¼ pðΘjX; YÞ.
In this framework, we can calculate the free energy and then select an optimal model, but it is difficult to calculate the free energy for all the candidate model combinations because of their large number. In Experiment 2, we sample the indicator parameter g as well as the parameter set Θ from Equation (20). Also, the exchange MC method is used in Experiment 2 for sampling. The probability distribution of each replica is qðg; Θ; βÞ ¼ φðgÞφðΘÞ expðÀ βEðg; ΘÞÞ (29) / φðΘÞ expðÀ βEðg; ΘÞÞ ð{ φðgÞ is uniformÞ: After simultaneous sampling of the indicator parameter g and the parameter set Θ is obtained, the compound is identified using the most sampled indicator parameter as � g, and MAP estimation for Θ is performed according to the conditional posterior probability pðΘj� g; YÞ.
For each replica, the parameters are updated by the Metropolis algorithm for a certain number of steps. In particular, for the parameter c with the constraint of P k c k ¼ 1, when updating c k , we simultaneously updated c k 0 where k�k 0 . After that, we exchange replicas between adjacent inverse temperatures. By doing this for a sufficient number of steps, we can sample the parameters without falling into a local solution [10]. To update the parameters, adaptive step-size tuning is used [11,12]. By this method, the step size of continuous variables is tuned so that the acceptance rate becomes the target value set.
We used the computer which has Intel(R) Xeon(R) CPU E5-2697A v4 at 2:60 GHz CPU with 2 cores and 251GB memory.

Experiment 1: composition ratio estimation
Following the model described in Section 2.1, we generate artificial data corresponding to three orbitals, Sr3d, Ti2p and O1s, assuming a sample containing four compounds, SrCO 3 , SrO, SrTiO 3 and TiO 2 . For the parameters of the reference spectrum described in Equation (7), the peak position μ and the area ratio m are taken from the literature [13][14][15][16][17][18][19], the peak width w and Lorentz ratio r are set to 0:6 and 0:1, respectively, and A is determined according to the compound. The composition ratios are set to 0:1, 0:3, 0:4 and 0:2 for the above four compounds, respectively. The true values of the other parameters are set as Note that the true parameter set is Θ � ¼ ffa � l ; b � l ; h � l g 3 l¼1 ; fc � k ;μ � k ; fŵ � lk g 3 l¼1 g 4 k¼1 g, with l ¼ 1; 2; 3 as Sr3d, Ti2p, O1s and k ¼ 1; 2; 3; 4 as SrCO 3 , SrO, SrTiO 3 , TiO 2 , respectively. Figure 1 shows the actual generated data and the component of each compound. The horizontal axis is in descending order, following the convention of XPS.
From the given artificial data, we simulate the Bayesian estimation for the parameter Θ using the reference spectra of the four compounds, SrCO 3 , SrO, SrTiO 3 and TiO 2 , as basis functions. The prior distributions and hyperparameters are set as The number of temperatures is set to 64. For the exchange MC simulation, we performed 100,000 steps of calculations and rejected 50,000 of them as burn-in for Experiment 1. Table 1 shows the MAP estimations and Figure 2 shows the fitting results using the estimation. From Figure 2, we can see that the fitting is accurate. In addition, Table 1 shows that the MAP estimations are also reasonable. In fact, according to Figure 3(a), the composition ratio c can be estimated with high accuracy. Other parameters such as μ; a; b and h are also estimated with high accuracy (Figure 3(b-e). However, that of the peak width adjustment parameter ŵ is not high (Figure 3(f-h)). This trend is confirmed throughout the entire experiment and it can be concluded that the peak width adjustment parameter is difficult to estimate for this model. This is an advantage of the Bayesian framework, i.e. we can discuss not only the estimation value but also its estimation accuracy by comparing the prior and posterior distributions. Next, we investigate the relationship between the signal-to-noise ratio and the estimation accuracy. The observation time is an important parameter in XPS measurement for adjusting the signal-to-noise ratio. Thus, we newly introduce an observation time Figure 1. Generated data for Sr3d orbital (a), Ti2p orbital (b) and O1s orbital (c). The component of each compound is plotted as the background spectrum plus each reference spectrum, and the total is plotted as the background spectrum plus the reference spectra of all compounds. parameter T into the model described in Section 2.1 as follows: T is a latent parameter, which is consistent with the fact that the intensity of the XPS data is proportional to the observation time. If the observation time is T ¼ 1000 for the data in Figure 1, the true values of a 0 l , b 0 l and h 0 l are respectively 0:2, 0:1 and 10:0 for all l. Here, we generate new data for T ¼ 10, 100 and 10; 000 for the true values of a 0 l , b 0 l and h 0 l . For these data, we simulate the Bayesian estimation for the parameter using the reference spectra of the four compounds, SrCO 3 , SrO, SrTiO 3 and TiO 2 . Figure 4 and Table 2 respectively show the fitting results and MAP estimates of the parameters for these data. Furthermore, Figure 5 shows the posterior distributions of the composition ratios for each observation time. As can be seen from the overall results, the fitting results and the accuracy of the estimation of the composition ratios improve as the observation time T increases. Table 3 shows the mean and standard deviation of the posterior distributions of each parameter when fitted to a normal distribution as an index for discussing the estimation accuracy quantitatively. In  particular, the results in Figure 5 suggest that T ¼ 1000 is necessary to obtain sufficient estimation accuracy for composition ratio estimation. From Table 2, it can be seen that the estimation of SrCO 3 is adequate for all observation times T. This may be due to the fact that SrCO 3 is necessary to represent the leftmost peak component in the O1s data.

Experiment 2: compound identification
For the estimation in Experiment 1, we use only the reference spectra of the compounds contained in the sample. In Experiment 2, we use the reference spectra of candidate compounds according to the model described in Section 2.2 to identify the compounds contained in the observed sample and to estimate their composition ratios. In addition to the four compounds used in Experiment 1, SrCO 3 , SrO, SrTiO 3 and TiO 2 , six compounds, namely SrSO 4 , TiO, Ti 2 O 3 , Sr, TiC and Ti, are used as candidates for compound identification from the data shown in Figure 1. For the parameters of the reference spectrum of the newly added compound, the peak position μ and the area ratio m are taken from the literature [19][20][21][22][23][24][25][26], the peak width w and Lorentz ratio r are set to 0:6 and 0:1, respectively, and A is determined according to the compound. This means that the total number of candidate compounds is 10. The true values of the indicator parameters defined in Equation (19) are g � k ¼ 1 for k ¼ 1; 2; 3; 4 and g � k ¼ 0 for the others. The prior distribution of c 0 k corresponding to the composition ratio is as The number of temperatures is set to 64. For the exchange MC simulation, we performed 50,000 steps of calculations and rejected 20,000 of them as burn-in for Experiment 2. Figure 6 shows the posterior distribution of the combination of compounds. The posterior probability is calculated by using the sampling result of the indicator parameter from the exchange MC method. From this result, we can see that the true compound combination, SrCO 3 , SrO, SrTiO 3 and TiO 2 , is estimated with more than 90% probability. Table 4 shows a comparison between the true value and MAP estimation for the composition ratio. Figure 7 shows the fitting result obtained by MAP estimation for parameter Θ, which is well fitted with the given data. Figure 8 shows the posterior distribution of the composition ratios c k , which is the similar result as in Figure 3(a) and indicates that the estimation accuracy of the composition ratio is satisfactory.
Furthermore, we investigate how the accuracy of compound identification changes with the measurement time. As in Experiment 1, the observation time T is introduced as in Equation (39), and compound identification is performed for data with T ¼ 10, 100, 1000, 10; 000. Figure 9 shows the posterior distribution of compound identification at various observation times T. In addition, Figure 10 shows the conditional posterior distribution of the composition ratios for the true compound combination. As can be seen from Figures 9 and 10, as the observation time decreases, the estimation accuracy of both compound identification and composition ratio estimation deteriorates. The result in Figure 10 shows that T ¼ 1000 is necessary to obtain sufficient estimation accuracy of the composition ratios. On the other hand, in Figure 9, even at T ¼ 10, the true compound combination shows the highest posterior    probability, 43:6%; 88:8%; 93:0%; 99:5% for T ¼ 10, 100, 1000, 10; 000, respectively. This indicates that compound identification is easier than the estimation of composition ratio with sufficient accuracy. This fact was obtained by investigating the estimation accuracy from Bayesian estimation.

Conclusion
In this paper, we proposed a framework of Bayesian inference for identifying compounds and estimating their composition ratios by exploiting the reference spectra of compounds in XPS spectral analysis. Also, we introduced indicator parameters in the compound identification and realized an efficient model selection algorithm by simultaneously sampling the indicators in the MC method. In a case like the present study, we have shown that not only the parameter estimates but also their confidence intervals can be obtained from the given data by using Bayesian estimation and comparing the prior and posterior distributions. As a result, that it is possible to estimate the observation time required for composition ratio estimation by looking at the shape of the posterior probability distribution of the parameters. Furthermore, it was found that the signal-to-noise ratio required for compound identification is smaller than that required for accurate composition ratio estimation under conditions such as those in this study, where the compound has peaks that appear in multiple energy ranges. This result is consistent with that of Nagata [27]. The investigation of other cases is a future issue.