Optimization of force-field potential parameters using conditional variational autoencoder

ABSTRACT Owing to their high-speed, force-field (FF) calculations for inorganic solid-state materials using the parametric potential have been widely employed as an effective tool for high-throughput calculations such as exhaustive property evaluation of material databases and/or calculations for models containing tens of thousands of atoms or more, including amorphous and grain boundary models. However, the accuracy of such calculations strongly depends on the choice of the FF parameters. Today, compounds containing three or more elements are targeted in the research and development of functional materials, in which case the number of parameters often exceeds 10 dimensions. This significantly increases the search space, making it difficult to determine the parameters rationally. To resolve this issue, we previously reported an FF parameter determination approach based on meta-heuristics (APL Materials, 8, 081111 (2020)). In this study, we further investigated a methodology to efficiently determine the FF parameters using a conditional variational autoencoder (CVAE), a type of deep learning method, which can reduce the dimension of the input parameters and distribute them in a probabilistic manner in the latent space. As a demonstration, we applied the method to the optimization of 11 FF parameters for an argyrodite-type Li7PS6 material, which has attracted considerable attention as a solid electrolyte for all-solid-state batteries. The results confirmed that the proposed approach can generate valid FFs that are highly consistent with the results of first-principles calculations, even when using a limited set of meta-heuristics-generated FF parameters. Graphical Abstract IMPACT STATEMENT Towards the development of functional materials, we applied our previously reported force-field (FF) parameter determination approach based on meta-heuristics to develop a methodology to efficiently determine the FF parameters using a conditional variational autoencoder (CVAE). As part of a demonstration, we selected argyrodite-type Li7PS6 material and applied the proposed method to its FF optimization. We believe that our study makes a significant contribution to the literature because it can serve as a basis for FF optimization of advanced materials.


Introduction
Materials with a high ionic conductivity are essential for improving the performance of batteries and fuel cells; thus, these materials have been the subject of extensive research.For example, Y 2 O 3 -doped ZrO 2 , an oxide ionic conductor, has been used as an oxygen sensor and fuel cell solid electrolyte [1].A β-alumina solid electrolyte (Na-ion conductor) containing Na 2 O-11Al 2 O 3 has been used in stationary NaS batteries [2,3].Recently, highly conductive lithium-ion materials have attracted attention for their use in allsolid-state batteries for electric vehicles [4][5][6].Furthermore, hydride-ion conductors are expected to be used in electrochemical devices owing to their strong reducibility (Eº(H − /H 2 ) = −2.35V vs. SHE) [7][8][9][10][11].Thus, the discovery of materials with high ionic conductivity may help solve some of the world's environmental and energy-related issues.
In recent years, material simulations have been increasingly used to discover materials with desired properties and to investigate ionic conduction mechanisms [12][13][14][15][16].In the case of ion-conductive materials, molecular dynamics simulations are often conducted to evaluate diffusion coefficients [16][17][18][19][20]. First-principles molecular dynamics (FPMD) and force-field molecular dynamics (FFMD) simulations are two commonly applied molecular dynamics methods, each having their own advantages and disadvantages.The FPMD simulation is believed to be accurate without parameter optimization despite its high computational cost.In contrast, the FFMD simulation has a low computational cost and can be easily applied to a large simulation cell with high speed [21].However, the reliability of the calculation depends on the FF parameters determined by the user.Recently, many attempts have been devoted to the determination of reasonable FF parameters with reference to the results of first-principles calculations, by exploiting the abovementioned two methods.For example, neural network potentials can be used to reproduce the parameters necessary for molecular dynamics calculations, such as the energy and forces acting on particles, from crystal structure information; thus, a neural network can reproduce the results of first-principles calculations [22][23][24].However, typically, neural networks are non-parametric potentials, which suggests that the number of parameters is considerable and that they are not always efficient because a large number of firstprinciples calculation results is necessary to construct the Equation.In contrast, parametric potentials based on the physicochemical knowledge of particle-particle interactions [25], such as the Coulomb interaction equation, Buckingham potential, and Morse potential, require only 10-20 FF parameters; therefore, the amount of data required for first-principles calculations, which serve as training data, can be reduced.
Adams et al. proposed the bond valence-based force field (BVFF), which is a parametric potential, for ionic crystalline materials and its universal parameterization; it has been successfully applied to visualize lithium-ion conduction pathways and perform highthroughput calculations of conductivity [26][27][28][29].Based on the previously reported results, we infer that the BVFF is an excellent parametric potential with versatility for ionic crystals.
We proposed a method to determine the FF parameters relatively quickly by performing a Cuckoo search (CS) for the improved potential of the BVFF by adding the three-body interaction of the Stringer -Weber potential, referring to the results of first-principle calculations [20,[30][31][32][33].As mentioned above, the number of FF parameters is approximately in the range of 10-20; therefore, for example, if a numerical value set of 100 delimiters per parameter is taken for 10 FF parameters, the search space will be 100 10 , which is very large.In contrast, in the CS, optimization is empirically possible through trial and error of several thousands to 10,000, which significantly reduces the search space.However, the optimization of the FF parameters with high accuracy is the most time-consuming process in the scheme of molecular dynamics calculations used for evaluating the ion diffusion capacity of solid electrolyte materials.
One of the major challenges in the FF parameter optimization stems from the enormous high-dimensional search space.In recent years, machine learning methods, such as autoencoders in deep learning, have been proposed to reduce the dimensionality of parameters in a reasonable manner [34].Among them, the conditional variational autoencoder (CVAE) is an algorithm that can generate data corresponding to target values (labels) [35].
In this study, we examined whether the conventional scheme of the FF parameter optimization can be implemented more robustly and more quickly by optimizing the FF parameters using the CVAE, in which the target values are set such that the difference between ab initio and FF calculations (loss function, see below) is as small as possible.As a demonstration, FF parameter optimization was performed for an argyrodite-type Li 7 PS 6 solid electrolyte using the CVAE technique, because Li 7 PS 6 and its derivatives exhibit high Li + mobility, and many experimental and computational studies have been reported [36,37].Figure 1 displays the crystal structure of Li 7 PS 6 .All of the P ions are tetrahedrally coordinated with S ions, forming PS 4 tetrahedral polyanions, and PS 4 units are isolated from the other PS 4 tetrahedra.Li ions are distributed at the interstitials of S ions and coordinated with two or four S ions.Four S ions belong to the PS 4 polyanions, while the remaining two S ions are not coordinated with P ions.It is well known that it is possible to dope aliovalent halogenide ions to the two S sites which are not coordinated with P ions, showing high Li ion conductivity as a result of the introduction of Li vacancies.

FPMD simulation for supervised data
FPMD calculations were performed to obtain supervised data for FF parameter fitting.The composition of the simulation model was Li 56 P 8 S 48 , which was extracted from the Materials Project database (mp -1211324).The Vienna ab initio simulation package (VASP) [38,39] was used with the projector augmented wave method [40] and a generalized gradient approximation (GGA-PBEsol) functional [41,42] for exchange correlation.In addition, the cutoff energy was set to 350 eV, and only the Γ-point was used.The isothermal -isobaric (NPT) ensemble was employed in this calculation, and the total simulation time was 1.0 ps with 1000 ionic steps at a temperature of 300 K. To ensure that the system reached equilibration, the data for the first 500 steps were ignored.The radial distribution function (RDF), angular distribution function (ADF), and cell volume (V) were calculated by averaging the ionic configurations extracted every 10 steps and were used as supervised (training) data for FF parameter optimization.The details are described in our previous papers [30,31,33].

Optimization of FF parameters by CS and definition of loss functions
The FF potential model comprised three interactions: the screened Coulomb potential, the Morse potential, and the angular section of the Stillinger -Weber potential, as reported previously [31].The combination of the first and second terms is known as the BVFF proposed by Adams et al. [26,27].The screened Coulomb potential considers a two-body interaction between the ions with the same sign as below, with a complementary error function expressed as follows: where q l is the charge on an ion labeled l, r ij is the interatomic distance between ion i and j, and ρ ij is the screening length, which is defined as a sum of the effective radii of the ions, ρ ij ¼ r i þ r j .� 0 is the permittivity of vacuum, and erfc is a complementary error function.The Morse potential, which describes the interactions between cations and anions, has the form: Here, D and α control the 'depth' and 'width' of the potential well, respectively, and R is the equilibrium interatomic distance.The angular section of the Stillinger -Weber potential is described as follows [43,44]: where θ ijk is the angle of the three-body configuration.r cut 3b ð Þ is the cutoff distance of the bond, fixed at 3.5 Å, which corresponds to the consideration of the nearest neighbor interaction.λ and γ are the adjustable parameters.Hence, the optimizing parameters are r i, D, α, R, λ, and γ, and the bond lengths of Li-S and P-S and bond angle ∠S-P-S are referred in this study.A total of 11 FF parameters were optimized.
The CS algorithm [45], which is a metaheuristic algorithm, was used for optimizing the FF parameters, as reported previously.Individuals defined by the FF parameter sets (N dimension) are generated randomly in an N-dimensional space in the first step, and their positions are updated with steps by evaluating the deviation from the supervised data using the loss functions mentioned later.The CS algorithm enhances the optimization of the FF parameters via 'Levy flight' [46], rather than via a simple isotropic random walk.In this study, the number of individuals was set to 30, and loss function L FF was the sum of the meansquared error of RDF, ADF, and V between 1000step FPMD and FFMD using CS-suggested FF parameters.The detailed setup of the FFMD calculations for FF parameter optimization is the same as that of the FPMD calculations, where the total simulation time was 1.0 ps with 1000 steps at 300 K under the isothermal -isobaric (NPT) ensemble.The Nagoya atomistic-simulation package (nap) software was used for the FF parameter optimization by the CS and FFMD calculation [30,31].

CVAE method
Autoencoders are an application of deep learning; they are algorithms that compress (encode) the inputted multidimensional data to a lower dimension using a neural network, retain important features, and restore (decode) the data to their original form.In contrast, the variational autoencoder (VAE) method encodes data to be distributed probabilistically in the latent space by assuming a normal distribution, N(0,1), for the encoded data.The CVAE is an extension of the VAE [47]; it adds a condition value (vector) to the input values of both the encoder and decoder [35].A key feature of this model is the ability to specify the conditions of the parameters generated by the addition of condition vectors.Figure 2 shows a schematic of the CVAE used in this study, and the detailed process in this study is described below.The input data are encoded into a mean vector μ and standard deviation vector σ, and the encoded data in the latent space are sampled through a multivariate Gaussian distribution.

FF parameter optimization using CS algorithm
Figures 3(a,b) show the variation in loss function L FF as a function of generation during the optimization of the FF parameters for Li 7 PS 6 by the CS.The loss function decreased with each generation and finally reached 0.0097.Table 1 presents the optimization parameters.Figures 3(c,d) show the time-averaged RDF and ADF, respectively, used to evaluate the loss function, which showed a good agreement for FPMD and FFMD.The best FF parameter sets appeared after 240 generations (30 individuals per generation), which took approximately 14 h using a 30-core parallel computing resource.In the CS, the loss function dropped sharply in the early stages of the search, while the convergence rate tended to decrease and became inefficient as the search generation progressed.In the initial five generations, the loss function dropped to approximately 0.2; however, as shown in Figures 3(d,e), the consistency of the RDF and ADF at this time was low although the RDF and ADF could capture the rough features of the crystal structure.For example, the interatomic distance of the nearest neighbors between Li and S appears to have been overestimated when using the FF parameters at L FF = 0.2.Therefore, it is considered unsuitable for evaluating the physical properties, such as the ionic conductivity of Li by molecular dynamics calculations, using the FF parameters for L FF = 0.2.In contrast, the FF parameters at L FF = 0.2 in the initial stage of optimization were found to reproduce the structural characteristic of Li 7 PS 6 to some extent.

Dataset preparation for the CVAE approach
Here, we prepared 'Dataset 1' for preparing the FF parameters using the CVAE, which comprises the FF parameters suggested by the CS algorithm and corresponding loss function values.Dataset 1 contained 83 FF parameter sets, the loss function of which was 0.2-1.0, as shown in Figure 3(a).Hence, Dataset 1 did not contain reasonably converged FF parameter sets.We also prepared 'Dataset 2' containing 150 CS-derived FF parameter sets from 5 initial generations (Figure 3(b)).The minimum of the loss function is 0.125.Figure 4 shows the relationship between the values of FF parameters D LiÀ S and D PÀ S generated by the CS algorithm and the corresponding loss functions.The FF parameters converged to several specific values with the decrease in the loss function, indicating that the CS algorithm searched for a global minimum.For example, the FF parameters of D LiÀ S tended to be distributed around 0.95 or 1.25, when the loss function is lower than ~0.2.This corresponded to the fact that the FF parameters at L FF = 0.2 could reproduce the rough structural features of Li 7 PS 6 .In the region where the loss function was greater than 1, the FF parameters were scattered significantly from the optimal solution obtained in the CS.This result suggests that Dataset 1, 0.2 < L FF <1.0, contained a structure that not only included many FF parameter sets that were close to the optimal solution but also data that deviated further from the converged values.The CVAE approach for Dataset 1 allows us to benchmark whether reasonable FF parameters that can mimic FPMD calculations can be obtained from a dataset lacking FF parameters exhibiting a sufficiently small loss function.Dataset 2 allows us to evaluate the extent to which the optimization of FF parameters can be improved with a combination of minimal CS and the CVAE approach.

FF parameter optimization by the CVAE approach using dataset 1
Dataset 1 (83 samples) was used for evaluation by using the CVAE approach.The neural network configuration used in this study consists of the FF parameters in the form of an 11-dimensional input vector, which is compressed into 2-dimensional latent variables via a 1-layer 7-dimensional encoder and mean and variance layers.Hyperparameter tuning was performed by optimizing the batch size, initial learning rate, L2-regularization parameters, and epoch numbers.The model was then restored to an 11-dimensional vector via a 1-layer 7-dimensional decoder.Figure 5(a) and Supporting Figure S1 show the diagnostic plots of the input FF parameters (true values) and the FF parameters restored by the CVAE via a 2D latent space (predicted values).Table 2 also lists the root-mean-square error (RMSE) and the coefficients of determination (R 2 ) values between the input and CVAE-restored FF parameters.Note that the FF parameters shown in Figure 5 were standardized for comparison.Although the diagnostic plots contained some outliers, we believe that the present neural network model (CVAE) produces results with a significant reproducibility according to the R 2 scores.Further, we evaluated the RMSE and R 2 , when dimension z of the latent space was set to 5, as listed in Table 2.The reproducibility was better when increasing z from 2 to 5.However, increasing the dimension z increases the search space by a power of z.Therefore, in this study, we discuss the results at z = 2 hereinafter unless otherwise specified.
The values of the two encoded variables in the latent space were divided into 26 parts in the range from −2.5 to 2.5.Hence, 676 (= 26 2 ) FF parameter sets were generated from the latent space via the CVAE decoder.The consistency of FFMD and FPMD was evaluated using the restored FF parameters with the loss function.Figure 6(a) shows the FFMD-evaluated values of the loss function using the restored FF parameters as displayed by the plot color using Dataset 1-derived CVAE, and a minimum value of 0.022 was  obtained.The RDF and ADF profiles obtained by the FFMD calculation using the best fitted FF parameters are displayed with the FPMD-derived profiles in Supporting Figure S2(a,b), respectively, showing a good agreement.The figure shows that the loss function tends to decrease as the values of the two latent parameters get closer to (−1.25, −0.83),where the minimum loss function is indicated.The deviation from the origin (0,0) of the latent space may be due to the lower reproducibility of the CVAE decoder, as shown in Figure 4 and Table 2; nevertheless, it is at an acceptable level considering the exhaustive search for 676 candidates.Figure 6(b) shows the histogram of the loss function distribution for the CVAE inputs (Dataset 1) and CVAE-decoded FF parameters.Most of the CVAE-decoded FF parameters showed lower loss function values than the Dataset-1-derived FF parameters (0.2 < L FF <1.0), though the minimum L FF value in the present example was not lower than the CS-derived FF parameters after 300 generations (L FF ¼ 0:0098Þ.Note that we also applied the CVAE approach using CS-derived FF parameters (L FF <0.1) for 300 generations, and the best L FF value was 0.016 (Supporting Figure S3).Hence, further improvement of the best FF parameters derived from CS (L FF = 0.0097) would be difficult in principle when using the BVFF.
The FF parameter sets with the loss functions ranging from 0.2 to 1.0 appeared even at a generation of 5 in the CS; thus, the computational time required for the CS could be significantly reduced.The optimization of the FF parameters by the CS over 300 generations conducted in this study took approximately 51,000 s when verified with 32 cores in parallel using an Intel Xeon Gold 6230 CPU.The time required for the training of CVAE, generation of FF parameters by GPU, and verification of the FF parameters by CPU (676 calculations) was approximately 3000 s, a 17-fold improvement in the computational efficiency.In addition, since the loss function was evaluated exhaustively in a 2D space, it is also expected to be robust, reducing the risk of missing a parameter that gives a minimum value of the loss function.7 shows a heatmap of the distribution of the loss function using the two coded values obtained by the CVAE approach using Dataset 2. The minimum value, L FF = 0.0291, and a reasonable agreement of the RDF and ADF profiles can be indicated between the FFMD and FPMD calculations, as shown in Supporting Figures S2(c,d).Note that the best encoded variables were not found at the origin of the 2D latent space but shifted to (−1.85, −1.4).This may be due to the imperfect reproducibility between the input CVAE  and CVAE-decoded FF parameters, as listed in Table 2.However, the distribution of the loss function shows a smooth change; thus, Bayesian optimization (BOpt) using the Gaussian process regression is expected to efficiently determine the appropriate encoded vector combination.For demonstration, we let the learners search for the encoded vector that gives the classical FF parameter with the lowest loss function among the 676 datasets obtained in advance.

FF parameter optimization by the CVAE approach using dataset 2
At the beginning of the search process, the loss function data were hidden from the learner, and the performance of the optimization was checked by disclosing the data specified by the learner at each sampling.These trials were repeated 100 times.Figure 6(b) shows the results of the BOpt; the best FF parameters were found after sampling for approximately 50 and 80 when considering the discovery rate of >80% and >99%.Hence, the optimal FF parameters can be efficiently determined, even if the optimal value deviates from the encoded vector of (0,0).Finally, we performed the FPMD and FFMD calculations for Li 7 PS 6 and evaluated the diffusion coefficients of Li migration using FF parameters obtained in this study.Supporting Figure S4 presents the mean square displacement (MSD) profiles at 500 K.The diffusion coefficient calculated by the FPMD calculation was 6.8 × 10 6 cm 2 s −1 , while the result of the FFMD calculation was 4.0 × 10 −6 cm 2 s −1 based on FF parameters derived from the CS, and 7.5 × 10 −7 and 7.6 × 10 −7 cm 2 s −1 using the CVAE approach with Dataset 1 and Dataset 2, respectively.Although the FFMD calculation results were an underestimation in relation to the FPMD calculation results, considering the errors in MD calculations at relatively low temperatures, it is believed to be within a certain range of error.

Conclusions
Calculations using classical FFs, which are effective for high-speed and large-scale calculations, have a limitation in that the FF parameters are high-dimensional and difficult to determine rationally.Recently, some attempts have been devoted to address this, such as determining the FF by meta-heuristics referring to the results of first-principles calculations.In this study, with the determination of conventional FF parameters for Li 7 PS 6 solid electrolyte material as an example, we developed a scheme to learn the BVFF parameters proposed by meta-heuristics (CS) using the CVAE and perform exhaustive calculations using encoded vectors compressed into two dimensions.As a result, we proposed an FF that is lower than the loss function given by the FF set used in the training.We could reduce the total computation time to approximately 1/ 5th of that required when using meta-heuristics alone.Furthermore, the loss function changes smoothly with the changes in the encoded vector and is considered to be highly compatible with Gaussian process regression and other methods.BOpt using Gaussian process regression was shown to further improve the search speed for the optimal classical FF in practice.Since this method can be applied to various classical FFs regardless of the BVFF, it is expected to be used as a general-purpose FF optimization method.

Figure 1 .
Figure 1.Crystal structure of Li 7 PS 6 .Green and yellow spheres represent Li and S ions, respectively.Pink tetrahedra correspond to PS 4 polyanion units.Structural data are extracted from the materials Project datasets (mp -1211324).

( 1 )
Neural network model building: Eleven FF parameters x as input values and corresponding loss function L FF , i.e. condition value y, are encoded to the 2D-encoded data in the latent space.The encoded data and user specified loss function, L 0 FF , i.e. the target value (y') are decoded to 11 FF parameters x' as output values.Model optimization was conducted to minimize the sum of the mean-squared error between x' and x and Kullback -Leibler divergence using the Adam optimizer after hyperparameter tuning.(2) FF parameterization: 676 (= 26 × 26) combinations of the 2D encoded data ranging from −2.5 to 2.5 and with a condition value of y' = 0 are decoded to obtain the FF parameter sets.(3) Evaluation of FF parameters: Loss function L FF is evaluated using the thus obtained FF parameter sets.

Figure 2 .
Figure 2. Schematic of the CVAE model structure used in this study.

Figure 3 .
Figure 3. (a) Loss function of each fitted FF set generated during CS optimization and (b) magnification of the early part of CS optimization shown in panel (a).(c) and (d) Comparison of the RDF and ADF between FPMD and FFMD using the CS-optimized FF parameters (L F F = 0.0097), respectively.(e) and (f) Comparison of the RDF and ADF between FPMD and FFMD using the insufficiently optimized FF parameters (L F F ~0.2), respectively.

Figure 4 .
Figure 4. Loss function as a function of the FF parameters.Panels (a) and (b) show D Li-S and D P-S , respectively, as examples.(a) Loss function L of each fitted FF set generated during CS optimization.(b) Variation in the selected optimization parameters, De(Li-O), alpha(Li-O), and Re(Li-O) with L.

Figure 5 (
Figure 5(b) shows the diagnostic plot of the input FF parameters (true values) and the FF parameters restored by the CVAE via a 2D latent space (predicted values) using Dataset 2, indicating sufficient recovery of the FF parameters.Figure7shows a heatmap of the distribution of the loss function using the two coded values obtained by the CVAE approach using Dataset 2. The minimum value, L FF = 0.0291, and a reasonable agreement of the RDF and ADF profiles can be indicated between the FFMD and FPMD calculations, as shown in Supporting FiguresS2(c,d).Note that the best encoded variables were not found at the origin of the 2D latent space but shifted to (−1.85, −1.4).This may be due to the imperfect reproducibility between the input CVAE

Figure
Figure 5(b) shows the diagnostic plot of the input FF parameters (true values) and the FF parameters restored by the CVAE via a 2D latent space (predicted values) using Dataset 2, indicating sufficient recovery of the FF parameters.Figure7shows a heatmap of the distribution of the loss function using the two coded values obtained by the CVAE approach using Dataset 2. The minimum value, L FF = 0.0291, and a reasonable agreement of the RDF and ADF profiles can be indicated between the FFMD and FPMD calculations, as shown in Supporting FiguresS2(c,d).Note that the best encoded variables were not found at the origin of the 2D latent space but shifted to (−1.85, −1.4).This may be due to the imperfect reproducibility between the input CVAE

Figure 5 .
Figure 5. Diagnostic plots of 11-dimensional FF parameters for CVAE using (a) Dataset 1 and (b) Dataset 2. True and predicted values corresponding to the input and output of the CVAE.

Figure 6 .
Figure 6.(a) Loss function distribution using FF parameters derived from 2D-encoded variables of CVAE for Dataset 1.(b) Histogram of the loss function for input (blue bar) and output (orange bar) FF parameters for CVAE using Dataset 1.

Figure 7 .
Figure 7. (a) Loss function distribution using FF parameters derived from 2D-encoded variables of the CVAE for Dataset 2. (b) Lowest (the best) loss function as a function of the sampling number using BOpt (blue symbol) and random search (gray symbol) for Dataset 2. The discovery rate of the optimized FF parameter sets is plotted in red.Note that the BOpt and random search samplings are averaged 100 times.

Table 1 .
FF parameter set and loss function optimized by CS (300 generations) and CVAE (Datasets 1 and 2).

Table 2 .
RMSE and the coefficients of determination (R 2 ) values between the input and CVAE-restored FF parameters.The dimension of the latent space, z, was set as 2 and 5.