Efficient autonomous material search method combining ab initio calculations, autoencoder, and multi-objective Bayesian optimization

ABSTRACT Autonomous material search systems that combine ab initio calculations and Bayesian optimization are very promising for exploring huge material spaces. Setting up an appropriate material search space is necessary for efficient autonomous material search. However, performing the autonomous search within the material space set up using the prepared descriptors is not sufficient to obtain an efficient search, which can be achieved by prioritizing specific descriptors and properties. Here, a material search system that can autonomously search the huge material space while performing multi-objective optimization that considers similarities among elements and emphasizes specific descriptors is proposed. This system has been used for a material exploration of Heusler alloys. The system has successfully proposed several candidate materials with half-metallic properties. The proposed system is very versatile and can be applied to various properties and material systems. Graphical abstract


Introduction
In recent years, functional materials have gradually become multi-elemental, resulting in a significant increase in their numbers. In conventional material exploration methods, the search for these materials is conducted by humans through repeated material syntheses, measurements, and considerations. However, it has become extremely difficult to explore the huge material search space using only the conventional search for new materials. Therefore, autonomous material search methods have been developed by integrating data science with conventional material-development methods. These methods can be divided into two major categories. The first category is the robotic autonomous material search methods, which combine materials experiments, robotics, and machine learning [1][2][3][4][5][6][7][8][9]. The robot performs actual material synthesis and measurements, whereas machine learning determines the next candidate material and process conditions, which enables autonomous material synthesis. The other category is an in silico autonomous material search method that combines ab initio calculations and machine learning [10][11][12][13][14][15]. Using ab initio calculations on compositions and structures determined by machine learning enables efficient autonomous material search.
Heusler alloys are an important application of the autonomous material search systems. They have attracted much attention in recent years because they exhibit various physical properties [16][17][18][19]. One of the most attractive physical properties is half-metallicity, which has one metallic spin band and one semiconducting spin band [20]. They have been extensively studied for applications in tunnel magnetoresistance [21][22][23][24], giant magnetoresistance [25][26][27][28], and other spintronics devices [29][30][31]. However, since the material search space is extremely vast due to their multi-elemental nature and their disordered phases, the development of Heusler alloys is very difficult.
For example, Sawada et al. developed an autonomous material search method to search for Heusler alloys with half-metallic properties [11]. In this method, an autonomous material search was conducted to maximize the spin polarization. However, maximizing the spin polarization alone is not sufficient for practical applications. Therefore, other material properties, such as the halfmetallic gap and Curie temperature, should also be considered. In addition, material science scholars have developed new functional materials by considering the similarity among constituent elements and the influence of crystal structure. However, the previous method [11] had not utilized such ideas in their materials search; thus, a more efficient autonomous material search method should be developed.
In this study, we proposed a material search system that can autonomously search the material space using multi-objective optimization of multiple material properties while considering similarities among elements and emphasizing specific descriptors. This method is then used in the search for Heusler alloys with half-metallic properties. Figure 1 shows an overview of the proposed method, in which the ab initio calculations and machine learning are sequentially repeated to autonomously search for materials. In the ab initio calculation part, the properties of the Heusler alloy are calculated according to the crystal structure and composition information determined in the machine learning part. The machine learning part derives information on the crystal structure and composition of the material from the accumulated Heusler alloy data to be used in the next ab initio calculations. The details of each part are described below.

Ab initio calculations
In the ab initio calculations, Green's function-based density functional theory calculations were conducted by the Korringa Kohn Rostoker coherent potential approximation (KKR-CPA) method using the AkaiKKR software [32]. Multi-elemental disordered phases can be calculated using the CPA, which can simulate them with high accuracy, especially in alloy systems [33][34][35]. Figure 2(a,b) show two major crystal structures of Heusler alloys: the full-Heusler structure (L2 1 ) and the half-Heusler structure (C1 b ) illustrated by using VESTA [36]. Typically, sites A, B, and C have different elements. Here, based on the information on the crystal structures and composition of the materials, which are determined in the machine learning part described later, the KKR-CPA was conducted to calculate the spin polarization, P, and the half-metallic gap, G. Figure 2(c) shows an image of the density of state (DoS) for half-metallic materials. P is calculated using Equation 1.
where n ↑ and n ↓ are the up spin and down spin DoS at fermi energy, E F , respectively. The half-metallic gap, G, is equivalent to the band gap of the down spin, Figure 1. Overview of an autonomous material search system. In the ab initio calculation part, spin polarization, P, and halfmetallic gap, G, are calculated based on the crystal structure and composition information determined in the machine learning part. The machine learning part derives information on the crystal structure and composition.
which is marked by the orange arrow in Figure 2(c).
Larger G and P result in better half-metallic Heusler alloys. The lattice constants were determined to minimize the total energy. In the lattice constant optimization, the spin-orbit interactions and the relativistic effects were not considered. The DoS was then calculated using the optimized lattice constant. In these calculations, spin-orbit interaction and relativistic effects were taken into account. Thus, P and G are estimated based on the DoS calculated here. More details are given in the Supplementary Materials (S1).

Machine learning
In the machine learning part, Bayesian optimization [37], an autoencoder [38], and Pareto optimization [39] were used to determine the crystal structure and composition of the materials for the next KKR-CPA calculation on the basis of the accumulated data on P and G.
In this study, the material space of the Heusler alloys to be explored by the autonomous material search system was defined by Equations 2 and 3.  , and C-site elements (C 1 , C 2 ), which are shown in Figure 2(a,b), were selected from the yellow, red, and blue regions, respectively, of the periodic table in Figure 3(a). The composition was set in increments of 0.2. The material was intentionally restricted to contain atoms of either Fe, Co, or Ni to reduce the total number of candidate materials. In the material space defined here, there are approximately 10 million candidate material patterns. It is quite difficult to carry out the KKR-CPA calculations for all these candidate materials. Thus, the KKR-CPA calculations were performed sequentially using machine learning to search the material space.

Full Heusler alloy S
First, to create descriptors (explanatory variables) to represent this material space, a composition vector C, a Magpie descriptor vector M and a crystal structure vector S were prepared. The composition vector C indicates the proportions of the elements in the material. Sites A, B, and C have 13, 9, and 11 elements, respectively, making the composition vector of 33 dimensions, as shown in the periodic table in Figure 3(a). M is composed of 28 values (e.g. average atomic number (M an ), average atomic weight (M at ), and melting point (M mt ), etc.) that can be calculated from the composition vectors [40]. The list of the Magpie descriptors is given in the Supplementary Materials (S2). By using the Magpie descriptors, it is possible to set up a material space that considers similarities between elements. For example, Fe and Ru, which are located near each other on the periodic table, have similar properties, while Fe and Bi, which are far apart, have dissimilar properties. It has been reported that the addition of a Magpie descriptor improved the accuracy of the machine learning model [41]. S is a twodimensional one-hot vector that represents the two crystal structures (i.e. the full-Heusler structure S F , and the half-Heusler structure S H ).
Even if the material space is defined as a 63dimensional vector combining C, M, and S, efficient autonomous search is not possible. This can be attributed to two reasons. First, the dimension of the material space is too large. General Bayesian optimization works properly in approximately 20 dimensions or less [42]. Second, from a materials science perspective, the crystal structure information (S) should be considered very important. Thus, to achieve an efficient autonomous search, a material space in which the information on the crystal structure is prioritized should be defined.
Therefore, an autoencoder, which is one of the dimensionality reduction methods, was used. Figure 3(b) shows an image of the dimensionality reduction of C and M by the autoencoder. The information on C and M was compressed in the middle layer of a neural network with the same input and output layers. Here, a 10-dimensional latent variable Z was created. The Z contains information on both C and M. The importance of the crystal structure information in the autonomous material search can be adjusted by changing the number of the latent variables created by the autoencoder. This means reducing the number of latent variables to achieve an autonomous material search in which the crystal structure information is prioritized. More details on the calculation conditions and results are given in the Supplementary Materials (S3).
A 12-dimensional material space is defined by combining the latent variables Z (Z 1 , Z 2 , . . . , Z 10 ) created by the autoencoder and the one-hot vector S (S F , S H ) representing the crystal structure. This material space is explored by the combination of the KKR-CPA method and the multi-objective Bayesian optimization (Figure 3(c)). Here, the following Gaussian process regression models were constructed for P and G, respectively.
where P, G, Z, and S are the spin polarization, halfmetallic gap, latent variables created by the autoencoder, and one-hot-vectors of the crystal structure, respectively. The upper confidential bound (UCB) for each material is calculated as an acquisition function using Gaussian regression models [43]. The candidate material with the largest Pareto hypervolume is determined from the UCB value and the material data (training data), in which P and G are already observed. This is then used as the target for the next KKR-CPA calculation. In this way, it is possible to autonomously search for materials with both large P and G. More details on the process are given in the Supplementary  Materials (S4).

Results and discussion
The autonomous material search system kept running for approximately two months to search the material space of the Heusler alloys. Figure 4(a) shows the P and G values of the 1783 materials explored by the autonomous material search system. The blue circles and red squares represent the half-Heusler and full-Heusler structures, respectively. The solid black lines indicate the Pareto frontier found in the 1655th exploration. Figure 4(b) shows the DoS of Co 0.6 Fe 0.4 Cr 0.8 Mn 0.2 As 1.0 with a half-Heusler structure, which is one of the materials on the Pareto frontier. The DoS indicates that the material has large P and G values. Thus, this autonomous material search system can autonomously search the material space and automatically suggest materials with large P and G. Table 1 shows a list of candidates of new Heusler alloys on the Pareto frontier in Figure 4(a). The ternary Heusler alloys (i.e. Co 1.0 Cr 1.0 As 1.0 and Co 1.0 Ti 1.0 As 1.0 ) have been previously studied [44]. Meanwhile, other alloys (e.g. quaternary, quinary, and senary Heusler alloys) have not been investigated.
All of them have half-Heusler structures. It is previously reported that the G of half-Heusler alloys (CoMnSb, NiMnSb, etc.) is larger than that of full-Heusler alloys (Co 2 MnSb, Co 2 MnGa, etc.) because the half-Heusler structures have vacancies in the C sites (Figure 2(b)) [45]. The results obtained by examining the vast material space of Heusler alloys with the autonomous material search are shown in Figure 4(a). They indicated that the tendency was not only true for specific materials but also true globally. If the conventional autonomous search was performed without prioritizing crystal structure information, the number of times to search for full-Heusler alloys would have increased, and it would have taken more time to find the Pareto frontier materials.
The materials shown in Table 1 have the potential to exhibit half-metallic properties. However, the actual synthesis of these materials is expected to be challenging because it is extremely difficult to synthesize half-Heusler structures containing multiple elements. High-throughput experiments and autonomous robotic synthesis techniques may allow the synthesis of these materials possible in the future. In addition, most Heusler alloys shown in Table 1 contain As, which is a toxic element. This also increases the difficulty in their synthesis and applications. Therefore, an autonomous material search system that considers material toxicity is required. Therefore, a future autonomous search that considers material properties (e.g. P and G) and toxicity can be developed based on the proposed autonomous material search system.  Table 1. List of the Heusler alloys on the Pareto frontier. The actual synthesis of these materials is expected to be challenging because of their instability and toxicity.

Conclusions
An autonomous material search system that combines ab initio calculations with an autoencoder and Bayesian optimization was developed. The system can autonomously explore the material space through multi-objective optimization of multiple material properties while considering the similarities among elements and emphasizing specific features (e.g. crystal structure). A new material with large P and G was predicted by using this system to explore the material space of Heusler alloys. The application of this system is not limited to Heusler alloys and can be used to explore various material spaces to achieve various physical properties.