Prediction of viscosity behavior in oxide glass materials using cation fingerprints with artificial neural networks

ABSTRACT We propose a novel descriptor of materials, named ‘cation fingerprints’, based on the chemical formula or concentrations of raw materials and their respective properties. To test its performance, this method was used to predict the viscosity of glass materials using the experimental database INTERGLAD. Using artificial neural network models, we succeeded in predicting the temperature required for glass to have a specific viscosity within a root-mean-square error of 33.0°C. We were also able to evaluate the effect of particular target raw materials using a model trained without including the specific target raw material. The results show that cation fingerprints with a neural network model can predict some unseen combinations of raw materials. In addition, we propose a method for estimating the prediction accuracy by calculating cosine similarity of the input features of the material which we want to predict.

. Optimized hyperparameters and prediction accuracy of five regression models considered in this study (in Section 3.1). Training of prediction models other than the neural network was done by Scikit-learn [1].
In this study, we tried to predict the isokom temperature of glass materials using a neural network.
We also investigated the prediction results obtained using other prediction models shown in Table   S1. We tested four prediction models provided by the Scikit-learn package. The brief introduction of each model is as follows.
Linear ridge regression (LRR) is a model that applies L2 regularization to the training of linear regression. So the structure of the trained model has the same shape as the linear regression. Kernel ridge regression (KRR) is one of the nonlinear regression models that estimate conditional expectation using the kernel function [2]. This model applies regularization to the optimization of kernel functions. Support vector regression machine (SVR) is a model consisting of a set of hyperplanes [3]. In this study, the kernel function was used to minimize the margins of the hyperplanes fast. Random forest (RF) is a model that consist of an ensemble of decision trees [4,5].
Ensemble gives more flexible and accurate prediction results than a single decision tree.
Each hyperparameter for the prediction models was optimized in the same way as the ANN model. Table S1 shows the optimized hyperparameter settings for respective models and the corresponding prediction errors.
Fluegel's paper [6] This study  Table S2. Total data and R 2 scores from Fluegel's paper (with data offset) and from the dataset in this research without and with data preprocessing (RANSAC).
We compare the prediction performance between our model and the Fluegel equation. As mentioned in the main text, Fluegel preprocessed data. Because we have a different database and different data references from those Fluegel used, we compared the representation of the Fluegel equation using different data. Table S2 shows our training results of linear regressions obtained with exactly the same terms suggested by Fluegel, together with the original Fluegel's results [6]. We did not preprocess the data, while the composition of data was limited to the same range as the Fluegel's data. The accuracy of predictions is evaluated by R 2 score, i.e., coefficient of determination. The total number of data records is 2.5 times larger than that of Fluegel's data. The prediction accuracy is lower than that of the Fluegel equation, as expected.
Random sample consensus (RANSAC) iteratively generates regression models from randomly selected data subsets. At each step, we evaluate the prediction errors and count the number of inliers with error smaller than a predetermined threshold value. Then, we update the model when it has more inliers than the model at the previous step. In the right lower part of Table S2, we show the results obtained using RANSAC with one of the typical outlier thresholds, −1 5 ⁄ × , where is the number of data records and is the median absolute deviation [7,8]. Figure S1. Dependence of prediction error of ANN models on the number of bins. Two parallel dashed lines represent the prediction errors from elemental attributes set and molar ratio, respectively.
In the extreme case of prediction in Subsection 3.4, we examined the dependency on the number of bins, one of the hyperparameters of the cation fingerprints. The results are shown in Figure S1. The lowest prediction error was obtained when the same number of bins (20 bins) as in the case of Figure   2 in the main manuscript was used. The variation of the prediction error with the number of bins is wider than that in Figure 2, and the sharp dip appears at 20 bins.
Although there is a difference in the shape of the graph as mentioned above, two characteristics seen in Figure 2 are also seen in Figure S1. First, the prediction error increases when too many bins are used. Second, the prediction error seems to converge to the prediction error of the molar ratio in the limit of large number of bins.  Table S3. Optimized hyperparameters and prediction accuracy of five regression models considered in this study (in Subsection 3.4). Training of the prediction models other than the neural network was done by Scikit-learn [1]. Note that the hyperparameters of all models have changed to prevent overfitting. Compared with Table S1, the increase in optimal regularization scales is seen in all models using a regularization scheme. In SVR and KRR, which are prediction models using kernel trick, the optimal value of gamma decreases. Since it is known that the decrease of the gamma value tends to suppress overfitting, the decrease of gamma value seen in Table S3 can  Figure S2 plots the relationship between maximum cosine similarity and the prediction error for the training results in the cases of (a) elemental attributes and (b) molar concentrations. Each dot represents a test material, and the solid black line represents the mean absolute error. It can be seen that the prediction accuracy is high if similar materials are included in the training set as in the case of fingerprints. Since the same pattern is seen in the three descriptors, this trend seen between similarity and predictability may be in common in machine learning using compositional descriptors.
However, similarity-accuracy distribution pattern or similarity scales required to secure target accuracy are all different. If we want a mean absolute error to be below 50°C in the prediction, maximum cosine similarity of 0.984, 0.9994, 0.9999996, or higher is required in cation fingerprints, elemental attributes, and molar concentrations, respectively.