Discriminating among tectonic settings of spinel based on multiple machine learning algorithms

ABSTRACT In geochemistry, researchers usually discriminate among tectonic settings by analyzing the chemistry elements of minerals. Previous studies have generally taken spinel and monoclinic pyroxene as subjects. Therefore, in this research, we took spinel as a breakthrough. Totally 1898 spinel samples with 14-dimension chemistry elements were collected from three different tectonic settings, including ocean island, convergent margin, and spreading center. In the experiment, 20 classification algorithms were conducted in the classification learner application of MATLAB. The validation accuracies, receiver operating characteristic curves (ROCs), and the areas under ROC curve (AUCs) show that the Bag Ensemble Classifier has the best performance in the problem. Its validation accuracy is 86.3%, and the average AUC is 0.957. For further analysis, we studied the importance of different major elements in discriminating. It has been found that TiO2 has the best impact on discrimination, and FeOT, Al2O3, Cr2O3, MgO, MnO, and ZnO are of less importance. Based on the Bag Ensemble Classifier, a MATLAB plug-in application named Discriminator of Spinel Tectonic Setting (DSTS) has been developed for promoting the usage of machine learning in geochemistry and facilitating other researchers to use our achievements.


Introduction
Discrimination diagrams are always used for discriminating among different tectonic settings. In the 1970s, Pearce and Cann (1971) put forward the first discrimination diagram based on the chemistry elements of a suit of basalt samples and acquired good results. After that, more and more discrimination diagrams were designed, and they greatly enriched the theory of magmatic petrology and the geotectonic hypothesis (Pearce & Cann, 1973;Pearce & Norry, 1979;Wang et al., 2017). Like the work of Peace, the diagrams were generally designed with basalt samples and were all based on the derivation of causality (Wang, 2001;Wang et al., 2016).
Spinel is a common kind of accessory minerals of mantle-derived xenolith. The researches of spinel can not only illustrate spinel's characters but also reveal the genesis of its host mantle-derived rock (Arai et al., 2011). In the previous studies of using minerals to discriminating tectonic settings, spinel was often taken as the subjects (Arai, 1994;Ishwar-Kumar et al., 2016;Zhao, 2016;Zhou et al., 2011).
With the development of geochemistry, researchers find that there are many limitations in using discrimination diagrams (Li, Arndt, Tang, & Ripley, 2015;Luo et al., 2018;Zhang, 1990): (1) the samples used in the earlier discrimination diagrams were usually from a typical area, and the amount of them were small; (2) a discrimination diagram only contains 2-3 kinds of elements, which can only reflect a small amount of information; (3) the discrimination diagrams are always designed by experiments and the designing processes are subjective; and (4) oversimplifying the discrimination process goes against the in-depth study of magmatism.
With the rapid evolutions of computer techniques, machine learning algorithms begin to make its presence felt in every walk of life (Zhou et al., 2018;Zhou, Pan, Wang, & Vasilakos, 2017). Generally, machine learning can be divided into two categories: supervised learning and unsupervised learning (Ang et al., 2016). One important task can be solved by supervised learning is classification. Its main idea is to train a classifier with a set of labeled samples and then used the classifier to recognize unlabeled samples. In geochemistry, the process of discriminating among different tectonic settings is essentially a classification task. There are already some researchers who have focused on using classification algorithms to discriminate among tectonic settings. Vermeesch (2006) put forward a set of decision trees to discriminate among tectonic settings of volcanic rocks. Petrelli and Perugini (2016) proposed to use Support Vector Machines (SVM) to solve the same problem and acquire high classification scores. Ueki, Hino, and Kuwatani (2018) compared the performances of SVM, Random Forest (RF), and Sparse Multinomial Regression (SMR) algorithm in discriminating different basalt tectonic settings, pointing out that SVM and RF could reach a high accuracy and the SMR could provide more explanation. The results acquired by them show the effectiveness of intelligent algorithms in geochemistry. However, these researches were all based on basalt samples, the methods adopted in their experiments were only 1-3 of intelligent algorithms, and in the publications, the technical problems of the algorithms were not actually discussed enough. Moreover, they all just proposed a solution to discrimination, not a real tool like diagrams that can be used by others.
In this study, we focused on the using of classification algorithms in discriminating among tectonic settings, and spinel samples were taken as the subjects. To achieve a comprehensive study, 20 different classification algorithms were adopted for comparison. The result showed that the Bag Ensemble classifier had the best performance. Meanwhile, it indicates that using the chemistry of spinel to discriminate among tectonic setting is also feasible. In addition, based on our researches, a MATLAB application was developed, which can be a convenience tool for geochemistry analysis.

Data collection and pre-processing
The spinel samples used in the research were obtained from the database of GEOROC (http://georoc.mpch-mainz.gwdg.de/georoc/) and PetDB (https://search.earthchem.org/). Totally 1898 pieces of spinel sample were collected. Some of them were derived from the in situ mantle, and the others are mantle rock inclusions. All the spinel samples had been labeled with their tectonic settings, and they were composed of 515 ocean island spinel (OIS) samples, 881 convergent margin spinel (CMS) samples, and 502 spreading center spinel (SCS) samples. The samples were collected from all over the world, including the

90°S
Ocean island Convergent margin Spreading center Figure 1. Distribution of spinel samples.

Classification algorithms
In this study, five types of currently popular classification algorithms were adopted: Decision Trees, Discriminant Analysis, Support Vector Machines, Nearest Neighbor Classifiers, and Ensemble Classifiers. The descriptions of them are as follows: (1) Decision Trees (Gavankar & Sawarkar, 2015): The structure of a decision tree is like a tree that contains a root node, several internal nodes, and several leaf nodes. The leaf nodes correspond to classification results, and other nodes represent the tests on some attributes. According to the tests, test samples will be divided into different child nodes. Therefore, the root node contains all samples, and the route from the root node to an internal node, or child node, represents a determining sequence. Currently, there are several variants of Decision Tree. In this research, the Fine Tree, Medium Tree, Coarse Tree were adopted. (2) Discriminant Analysis (Welling, 2005): this type of classification algorithm is based on the assumption that the data of different classes obey different Gaussian distributions. The main process is firstly training a classifier to fit a function that can estimate the parameters of the distribution of each class, and then using the classifier to predict new samples. The most widely used Discriminant Analysis is the Linear Discriminant Analysis (LDA). In LDA, the key step is to find out a projection hyperplane in a k-dimensional space, then by projecting different classes of samples onto the hyperplane, maximizing the between-class distances and minimizing the within-class distances, as shown in Figure 2. Another typical Discriminant Analysis is Quadratic Discriminant Analysis (QDA). The main difference between QDA and LDA is that a QDA is a classifier with a quadratic decision boundary. In our research, both QDA and LDA were adopted.
(3) Support Vector Machines (SVMs) (Cortes & Vapnik, 1995;Li, Miao, & Shi, 2014): this type of algorithms are based on the theory of VC dimension and structure risk minimizing principle. The main task of SVM is to find out a hyperplane in the feature space of samples to maximize the distance between different classes, and the key of SVM is to design an effective kernel function, as shown in Figure 3. The common kernel functions include Gaussian kernel, linear kernel, quadratic kernel, and cubic kernel. In this study, all the four kernels were tried to find out which was the best one for the discrimination task. (4) Nearest Neighbor Classifiers (Altman, 1992): this type of algorithms mainly includes the K-Nearest Neighbor (KNN) algorithm and its variants. KNN is one of the most simply classification algorithms. Its main idea is to find out k nearest samples of a new data in the feature space, and then classify it into a specific class according to the k neighbors, as shown in Figure 4. In this research, six Nearest Neighbor Classifiers including Fine KNN, Medium KNN, Coarse KNN, Cosine KNN, Cubic KNN, and Weighted KNN were adopted. (5) Ensemble Classifiers (Polikar, 2012): different from other types of classification algorithms that only contain one classifier, Ensemble Classifiers are proposed as using multiple classifiers to improve the final performance. Its strategy is to aggregate multiple weak learners into a strong learner, as shown in Figure 5. The weak learners can be Decision Trees, KNN, or other single classifiers. The main aggregation strategies include Bagging, Boosting, and Random Subspace method. In our experiment, the Bagged Trees, AdaBoosted Trees, RUSBoosted Trees, Subspace KNN, Subspace Discriminant algorithms were adopted.

Assessment criteria
To assess the performance of a classifier, many criteria have been proposed. Generally, the main assessment criteria include validation accuracy, confusion matrix, receiver operating characteristic curve (ROC), and the area under the ROC curve (AUC).
(1) Validation accuracy: When training a model, the whole samples will be divided into a training set and a validation set. The training set is used to determine the structure and the parameters of the classifier, and the validation set is used to test the classifier. Therefore, a good classifier has a high validation accuracy.
(2) Confusion matrix: Confusion matrix is an n × n matrix that is used to visualize the performance of classifiers, where n is the number of classes. It can show the details of the classification result of a classifier by comprising the real classes and the prediction results. (3) ROC (Gönen, 2006): Considering that validation accuracy cannot really reflect the real performance of a classifier when the distribution of samples is not balanced, ROC was put forward as a new criterion. The x-axis of ROC is the false-positive rate (FPR), and the y-axis is the true-positive rate (TPR), or sensitivity, as shown in Figure 6. Generally speaking, the farther from the ROC to the reference line (red line in Figure 6), the better the performance of the classifier. (4) AUC (Swets, 1986): Although ROC can reflect the performance of a classifier intuitively, researchers always want to assess the performance with a simple number. Therefore, AUC was put forward. Just as its name implies, AUC is the area under the ROC curve. So a good classifier will have a high AUC value.
In this research, all the four criteria were used to evaluate the performances of different classifiers. The validation accuracy was first adopted to initially assess all the 20 classifiers mentioned in Section 3.1. After that, the confuse matrix, ROC, and AUC were used to assess the five classifiers that had the highest validation accuracies.

Classification learner of MATLAB
The classification learner application provided by MATLAB is a toolbox that allows users to interactively analysis data, training classifiers with several different types of machine learning algorithms, as shown in Figure 7. Through this application, users can select different algorithms, different features of samples, and different parameters to obtain an optimal classifier, and acquire the corresponding validation accuracy. After training, it can show the confusion matrix, ROC and AUC to the users, as well as the scatters of any two of the features of samples.  In this research, totally 20 classifiers mentioned in Section 3.1 were adopted,

Discrimination results
In the training process, the k-fold cross-validation method (Anguita, Ghelardoni, Ghio, Oneto, & Ridella, 2012) was adopted. In previous researches, the value of k was usually set as 10, meaning that the validation set occupies 10% of the samples. In our research, the k was set as 5 to make the validation set larger to sufficiently validate the classifiers. For each classifier, we changed the parameters manually several times to obtain the optimal parameters. The optimal parameters of different classifiers and their validation accuracy are shown in Table 1.
From Table 1, it can be seen that Ensemble classifiers and Nearest Neighbor Classifiers have better performance as a whole than the others. The five classifiers that have the highest validation accuracies are AdaBoost Ensemble Classifier (85.8%), Bag Ensemble Classifier (85.7%), RUSBoost Classifier (85.0%), SVM with Gaussian kernel (84.7%), and Weighted KNN (84.4%). After that, the ROC and AUC of the five classifiers were calculated as shown in Figures 8-12.
In Figures 8-12, the dark blue curves are ROCs, the light blue areas are the AUCs, and the red circles represent the status of the classifiers. For one classifier, there are three ROC curves. For example, in Figure 8, the first ROC is prediction effect of OIS and non-OIS (CMS and SCS), the second ROC reflects the prediction results of CMS and non-CMS (OIS and SCS), and the third ROC reflects the prediction results of SCS and non-SCS (OIS and CMS). Table 2 shows the AUCs of the five classifiers.
The last column of Table 2 shows that the Bag Ensemble Classifier has the highest average AUC, although it has a little less validation accuracy compared with the AdaBoost Ensemble classifier. Therefore, we took the Bag Ensemble Classifier as the best one in this experiment, and the second best is the AdaBoost. Figure 13 shows the confusion matrices of two classifiers.
On another hand, the last row of Table 2 shows that the average AUC has the largest value at the status of SC-OI & CM, followed with OI-CM & SC and CM-OI & SC. This means that the classifiers do better in discriminating among SCS samples and non-SCS samples.

Importance of major elements
After the above assessment, we analyzed the importance of every major element. As mentioned above, the Bag Ensemble Classifier was regarded as the most suitable method. Therefore, this classifier was taken to evaluate the feature importance. In this part, 11 experiments were conducted. In each experiment, one major element was removed from the samples, and then the classifier was trained with the remaining data. For example, in the first experiment, SiO 2 data were not used for training, and   the classifier was trained with the remained 10-dimension data. After training, the validation accuracy was 83.7%, and the average AUC is 0.95. The 11 results are shown in Figure 14. Figure 14 shows that when FeO T , Al 2 O 3 , Cr 2 O 3 , MgO, MnO, or ZnO is not taken for training, the validation accuracy can still reach to 85%, and the average AUCs can still reach to 0.953. It means that these six major elements contribute less to discriminating     Figure 14. Evaluations of the importance of major elements.
among different tectonic settings. On the other hand, when TiO 2 is removed from the training samples, the validation accuracy and average AUC drop rapidly, meaning that TiO 2 contributes the most to discrimination.

Discrimination application design
From the results of Section 4.1, it can be seen that except for basalts, spinel can also be used to discriminate among different tectonic settings, and classification algorithms are effective methods for discrimination. In order to facilitate other researchers using our achievements and promote the study of machine learning in geochemistry, we developed a MATLAB application named Discriminator of Spinel Tectonic Settings (DSTS). The interface of DSTS is as Figure 15. The underlying algorithm of DSTS is based on the Bag Ensemble Classifier, which has been determined as the most appropriate algorithm in Section 4.1. The main using flow is: (1) input the chemical elements into the table manually; (2) click "Calculate" button, then the application will give out the predicted tectonic settings, and show them in the last column of the table.
To improve efficiency, an "Import Data" button was designed to exchange data with the workspace of MATLAB. By inputting the name of the variable existed in the workspace into the right text box of "Import Data" button, the application can read the variable and display them on the table. Then, users can discriminate the corresponding tectonic settings by clicking the "Calculate" button.
The "Reset" button was designed to remove all the data existed in the application. On the top of the application, there is a toolbar that contains an "Open" button and a "Save" button. The "Open" button can import excel files into MATALB, and the "Save" button is used to export the prediction results to the workspace of MATLAB.
This application is now available at https://github.com/Rondapapa/DSTS/tree/dsts. The installation tutorial is in README.md. After installation, the DSTS can be found in the APP toolbox of MATLAB, as shown by the green rectangle in Figure 16.

Discussion
In the experiment, the chemical elements were all the major elements of spinel excluding Na 2 O and K 2 O. For each sample, the total amount was more than 97 wt%. Tables 1 and 2 show that the AdaBoost, Bag, and RUSBoost Ensemble classifier can reach a high validation accuracy of more than 85%, meaning that the 11 major elements of spinel sample can record the information of different tectonic settings. By calculating the ROCs and AUCs of the classifiers that had the highest validation accuracy, Bag Ensemble Classifiers was regarded as the most suitable algorithm for this discriminating task. By analyzing the average AUCs, it could be found that the classifiers do better in discriminating between SCS samples and non-SCS samples; the follows were discriminating between OIS samples and non-OIS sample, and the discriminating between CMS samples and non-CMS samples.
In the previous studies, spinel has been regarded as an effective indicator of the tectonic settings of basic-ultrabasic rocks. However, researchers are more inclined to use the content of Mg#, Cr#, Al 2 O 3 , and Fe 2 O 3 for discriminating (Arai, 1994;Barnes & Roeder, 2001;Franz & Wirth, 2000;Ishwar-Kumar et al., 2016;Jan & Windley, 1990;Kamenetsky, Crawford, & Meffre, 2001;Maurel & Maurel, 1982;Oh, Seo, Choi, Rajesh, & Lee, 2012;Tamura & Arai, 2006). Only a few researchers think that TiO 2 is also a useful indicator (Dharma Rao, Santosh, Kim, & Li, 2013;Leterrier, Maury, Thonon, Girard, & Marchal, 1982). In feature analyzing of the study, it comes to a conclusion that TiO 2 had the best impact on discrimination, and FeO T , Al 2 O 3 , Cr 2 O 3 , MgO, MnO, and ZnO were of less importance. However, the reason is not quite clear by now, and we will focus on it in the next work.

Conclusions
In this study, a machine learning-based analysis was carried out to discriminate among different tectonic settings with spinel samples. Totally 1898 pieces of spinel sample with 14dimension elements were collected. After data filtering, 1398 samples with 11-dimension elements were selected for analyzing. All the spinel corresponded to three types of tectonic settings, including ocean island, convergent margin, and spreading center. The 11-dimension elements were: SiO 2 , TiO 2 , Al 2 O 3 , Cr 2 O 3 , V 2 O 3 , FeO T , CaO, MgO, MnO, NiO, ZnO. In the experiment, totally 5 types, 20 classification algorithms were adopted for comparison, and the classification learner application provided by MATLAB was used to conduct the algorithms. The results showed that 13 classifiers can reach a high accuracy of more than 80%, and the Bag Ensemble Classifiers had the best performance on this task. Its validation accuracy was 86.3% and its average AUC was 0.957. The results prove that using spinel to discriminate among different tectonic settings is feasible, and machine learning is an effective solution for this task. In addition, the feature importance analysis proves that TiO 2 is the most important elements for discriminating, and FeO T , Al 2 O 3 , Cr 2 O 3 , MgO, MnO, and ZnO do not have a significant impact on the performance of the classifier. And it can be a reference to further studies.
Based on these analyses, a MATLAB plug-in application was developed for the promotion of the using of machine learning in geochemistry and the convenience of other researchers using our achievements. The installation package can be downloaded from our Github repository, as well as the installation tutorial.
The shortcoming of this study is the number of spinel samples. In future work, more samples should be taken for training the classifiers. Moreover, trace elements may also contain the information about tectonic settings. As the accumulation of data, trace elements should also be taken into consideration to improve the performances of the classifiers.

Disclosure statement
No potential conflict of interest was reported by the authors.

Data availability statement
The data referred to in this paper are not publicly available at the current time. The DSTS application developed in this research and the installation tutorial can be downloaded from our Github repository: https://github.com/Rondapapa/DSTS/tree/dsts.