Fully autonomous materials screening methodology combining first-principles calculations, machine learning and high-performance computing system

ABSTRACT Materials screening by high-throughput first-principles calculations is a powerful tool for exploring novel materials with preferable properties. Machine learning techniques are expected to accelerate materials screening by constructing surrogate models and making fast predictions. Especially, black-box optimization methods such as Bayesian optimization, repeating the construction of a prediction model and the selection of data points, have attracted much attention. In this study, we constructed an autonomous materials screening system using first-principles calculations and machine learning working on high-performance computing systems. The performance of the system was evaluated by applying the system to the exploration of high-k dielectrics using band gaps by hybrid functional calculations and dielectric constants by density functional perturbation theory calculations, respectively. The developed system identified materials with anomalous properties, as well as materials with both wide band gaps and high dielectric constants by utilizing appropriate black-box optimization methods, much faster than random exploration. The code for the developed system is published on an open repository. GRAPHICAL ABSTRACT IMPACT STATEMENT We realized autonomous computational materials screening system on high-performance computing system using machine learning. The developed code will be published in an open repository.


Introduction
The drastic improvement of computational algorithms and computer performance in recent decades enables first-principles calculations to identify the promising materials from many candidate substances on highperformance computing (HPC) systems.Examples of such materials screening include the identifications and discoveries of materials for various applications such as photovoltaic materials [1,2], p-type transparent conducting oxides [3][4][5], and nitride semiconductors [6,7].More recently, the throughput of the first-principles calculation has become so efficient that several computational materials databases possessing the data on the order of 10 3 -10 6 materials have been constructed for various purposes and published [8][9][10][11][12][13][14].
Meanwhile, machine learning (ML) techniques have become prevalent in the field of materials science as a method to accelerate materials exploration.One of the most common applications of ML to materials science is the construction of surrogate models to make fast predictions of material properties.Such prediction models are successfully constructed for various substances and material properties in many previous studies [13][14][15][16][17][18][19][20][21][22].Moreover, black-box optimization methods such as Bayesian optimization, accelerating the discovery of data points with desirable properties by repeating the construction of ML models and the selection of promising data points based on the ML models and an appropriate policy, have been also studied well in the field of materials science .Computational materials exploration assisted by those ML methods is expected to be much more efficient than conventional brute-force computation of candidate materials, thereby allowing us to drastically expand the search space (i.e. increase candidate materials) and/or apply accurate but computationally demanding methods such as hybrid functionals [45,46] and the GW approximation [47].
In principle, a combination of automatic data acquisition and black-box optimization methods based on ML techniques enables us to construct an autonomous exploration system, which independently explores preferable data points without any human intervention.Actually, autonomous materials development by a combination of ML and robotic techniques has been realized in the experimental field [48][49][50][51].In the computational field, efforts have been made to reduce human intervention through the automation of programs and to efficiently select data for computation using data-driven science, even before the widespread use of ML.For example, cluster-expansion models constructed by multiple first-principles calculations and linear fitting have been utilized to identify stable atomic configurations.Software for the automation of cluster expansion and related computational procedures has been developed over 20 years ago and continues to be improved [52].Genetic algorithms and particle swarm optimization have also been used for crystal structure prediction software [53][54][55], running on HPC systems.Recently, super-hard carbon structures were identified [56] by developing a genetic algorithm taking account of the hardness and energy simultaneously.Regarding an example of the application of ML surrogate models, Seko et al. [23] have identified the materials with low thermal conductivity based on Z-score of the Gaussian process.Based on multi-stage ML, Schmidt et al. [22] identified candidates of ultra-hard and incompressible materials, superconductors and materials with the extreme gap deformation potentials, Xin et al. [37] candidates of wide-gap materials, Ojih et al. [43] materials with theoretically extreme mechanical properties.Autonomous materials screenings based on ML surrogate models also have been realized in several fields, for example, Tran and Ulissi have identified [26] the candidates of electrocatalysis, Oftelie et al. [27] the candidates of thermoelectric hetero-structures, Fukazawa et al. [31] the optimal choice of elements and ratio of RFe 12 -type magnetic compounds, Chen et al. [33] the candidates of solar cell absorbers, Montoya et al. [35] the stable materials, and Ma et al. [38] the candidates of ferroelectric and photovoltaic semiconductors.Dunn et al. developed the ROCKETSLED code [30] providing a framework of autonomous exploration by black-box optimization methods for general purposes and have demonstrated that their framework can accelerate the identification of superhard materials.We note that there are also rich successful examples of the combination of high-throughput calculations and ML techniques in the field of chemistry [25,34,41], although it is outside the scope of this paper.
However, despite the promising concept of autonomous computational materials screening and several successful examples, there are still significant technical barriers in this approach.One of the most cumbersome barriers would be the complexity of full automation of the first-principles calculations on HPC systems.For example, (1) because HPC systems such as in-house cluster computers and supercomputers consist of multiple computers, calculations must be performed in parallel and are usually controlled by distributed job schedulers; (2) one needs to make a workflow consisting of multiple calculations (e.g. one often needs to optimize the structure under specific computational conditions before calculating various properties by different computational methods, because these calculations require the optimized structure) and control those workflows properly; (3) the computational cost depends heavily on the number of atoms and the symmetry of the target atomic structure, resulting in greatly diverse computational times for each calculation; and (4) some calculations may be aborted by unintentional error (e.g. one may fail to converge the electronic structure within a predetermined number of optimization steps).Therefore, high-throughput material calculations are generally performed using workflow management software [57][58][59].
To perform autonomous high-throughput materials screening robustly, it is necessary to construct ML prediction models on HPC systems using a workflow management system.Typically, ML models have been updated at regular intervals of time or data [26,30,35].In this case, ML models need to be updated at an appropriate frequency.If the frequency is too high, computational resources may be wasted, while if it is too low, the search efficiency may be reduced [39].However, it is difficult to determine the optimal frequency of the ML procedure because not only the required time for first-principles calculations varies depending on the method and crystal structure as mentioned above but also the required time for constructing ML models also varies depending on the algorithm and the size of the training set.Moreover, the tendency of the computational time of the firstprinciples calculations may also change depending on the black-box optimization methods, as shown later.
In this work, we developed an autonomous materials screening system based on the ATOMATE library [60], which is an existing library to construct a large-scale computational materials database.The developed system does not require any parameters to control the frequency of the ML procedure.Instead, one computational node is assigned to the ML procedure.Data points obtained while the ML procedures are being performed, are automatically detected in our framework.The performance of the developed system is evaluated by exploring (1) materials possessing anomalous band gaps and dielectric constants by the Bound Objective-free eXploration (BLOX) method [34] and (2) materials possessing both wide band gaps and high dielectric constants by the Probability that properties within the Target Range (PTR) [42,[61][62][63].This test assumes an exploration of promising candidates for high-k materials, which possess both wide band gaps and high dielectric constants for potential CPU, DRAM, and flash memory applications, which have previously been identified by high-throughput firstprinciples calculations [64].The performance evaluations are conducted on an in-house computer cluster from the aspect of the real system running time.The developed code, named machine-learning-assisted ATOMATE (ML-ATOMATE), is published on an open repository.

High-throughput first-principles calculations using atomate code
The developed system with the ML-ATOMATE code is based on the ATOMATE code for high-throughput computing, which is an existing python library.Therefore, firstly we overview the existing framework of ATOMATE on HPC systems, while the detailed descriptions are given in Ref [60] and the online documentation [65].
As described above, large-scale materials screening on HPC systems requires complicated procedures.Firstly, such calculations are usually performed on HPC systems such as supercomputers and computer clusters, containing many nodes.Awkwardly, the required computational time of each calculation greatly depends on the target material and the computational algorithm and conditions.Therefore, the calculations are usually managed with a queue system such as PBS, SLURM, and SUNGRID ENGINE.Another barrier to high-throughput material calculations is that one usually needs complicated workflows for calculating target properties.For example, structure optimization is often necessary under a certain computational condition before calculating target properties.As another example, when multiple target properties are required, multiple calculations with corresponding computational methods and conditions to each target property (e.g.hybrid functionals or GW approximations for band gaps, density functional perturbation theory (DFPT) for dielectric constants) are required.Moreover, calculations may sometimes abort for several reasons, including failure to converge on electronic or ionic optimization steps, computational limitations such as excess of memory limits or wall time, unintentional system errors or computer malfunctions.Such aborted calculations must log the cause of the error and rerun with modified computational conditions and/or computational nodes if the error is likely recoverable.The management of these procedures must be completely automated because it is unfeasible to manually maintain those calculations for more than several hundred materials.
The ATOMATE library provides the framework to automate high-throughput calculations of material properties, solving the aforementioned problems by utilizing the FIREWORKS [57], CUSTODIAN [66], and PYMATGEN libraries [66].The ATOMATE code provides procedures (called firetask in the FIREWORKS framework) to make input files, run software for first-principles calculations such as the VASP code [67,68] and store the computational results into MONGODB, and bundled those procedures in a job (called firework in the FIREWORKS framework) for various material properties.Those fireworks can also be bundled into a single workflow, with each firework inheriting information from the previous firework as necessary.For example, the firework for structure optimization can pass the optimized structure to the next fireworks to calculate other properties.The FIREWORKS library framework can submit workflows to a queue system, ensuring consistency in the order of fireworks, maintaining a fixed number of jobs in the queue system, and monitoring and logging the states of workflows using MONGODB.The logging and modification of calculations aborted by errors are also implemented using the CUSTODIAN library.One can also combine the computational results of several kinds of first-principles calculations (e.g.computational results of structure optimization, hybrid functionals and DFPT) for the same materials by the builder function, creating a set of data for each material including those computational results (e.g.data of BaTiO 3 possessing the optimized structure, band gap and dielectric constant).The procedures concerning materials research (e.g.making input files, analysis of results and error of calculations) are firmly supported by the PYMATGEN library.A schematic overview of typical high-throughput calculations of ATOMATE is shown in Figure 1.

Autonomation of the high-throughput calculations
To enable high-throughput calculations to be performed autonomously (in other words, to estimate the desirability of remaining candidate materials and give high priority to promising candidates), the developed system in this work uses one child node for ML.The ML node repeats the following procedures; (1) sleeps until one or more new workflows successfully finish; (2) runs the builder function and aggregates computational results into materials data; (3) adds the newly acquired data points into the training data set for ML; (4) constructs an ML model and estimate desirability (called acquisition function in a context of blackbox optimization); and (5) sets the priorities to submit jobs to a queue system using the set_priority function implemented in the FIREWORKS library, according to the desirability estimated by black-box optimization methods based on the ML models.The schematic overview of the developed system is depicted in Figure 2, while the detail of the ML technique is described in the following section.We note that cases similar to our system have been well studied in a subfield of Bayesian optimization, known as asynchronous Bayesian optimization [69,70].While leveraging such techniques has the potential to improve the efficiency of our system, we have chosen not to utilize asynchronous Bayesian optimization techniques in this study to maintain a focus on introducing our system.
Our approach is similar to the framework of the ROCKETSLED code in providing a general autonomous exploration that operates on HPC systems using the FIREWORKS code.However, there are two main differences between the ML-ATOMATE code and the existing ROCKETSLED library.The first difference is timing of ML procedures.Within the ROCKETSLED framework, ML procedure runs at the end of each workflow.The frequency of ML is controlled by the batch mode, which conducts ML when a certain number of data points are newly obtained.On the other hand, within the ML-ATOMATE framework, the ML procedure is performed only once when new data is recognized by the parent node, even if multiple data have been acquired since the previous ML procedure.As mentioned above, it is difficult to determine the optimal batch size and, therefore, the ML- ATOMATE is advantageous because users do not need to determine the frequency parameter.On the other hand, if the procedure of ML does not require a lot of computational resources, the ML node may idle until new data is acquired, and therefore the developed framework may become disadvantageous.The second advantage of our framework is that the ML-ATOMATE can use the workflow and builder implemented in the ATOMATE library without customizing, while the ROCKETSLED requires appending an ML procedure to workflows.Instead, when a target problem cannot be solved by the ATOMATE framework (e.g.optimization of mathematical functions or simulations for values other than material properties) and one would like to use our framework, one needs to implement an appropriate data aggregation system instead of the builder function.

Black-box optimization methods using machine learning techniques
Although Bayesian optimization method would be the most common black-box optimization method, we determined to employ BLOX and PTR methods instead of Bayesian optimization.This is because our focus in high-throughput calculations was not on finding champion data for a single physical property, but on finding materials that exhibit the limit of the distribution in the properties space composed of multiple physical properties, and identifying materials with desired physical property values that meet certain criteria.The former exploration is useful not only for finding materials of scientific interest but also for setting target ranges for physical properties of the latter exploration.On the other hand, the reason for conducting the latter exploration is that even if extreme calculated values are found, the practical application of these materials may be prevented by other conditions such as defect properties.Therefore, it would be practically useful to identify as many target substances with moderate criteria as possible.
BLOX is a black-box optimization method to discover deviated data points in the objective variable space, without the determination of any exploration criteria and thereby requiring no prior knowledge of the distribution of the target properties.In a recent study, BLOX was used in materials screening for extreme mechanical properties, along with first-principles calculations [43] in a recent study.On the other hand, PTR is a black-box optimization method to collect data points within a desirable region in the property space.This method has been developed and applied in the field other than computational materials screening [61][62][63].Our recent work [42] has demonstrated that this method can be applied to computational materials screening and is superior to Bayesian optimization for the purpose to collect substances possessing properties meeting moderate criteria.

BLOX
The acquisition function of BLOX is based on the kernelized Stein-discrepancy [71,72] between the distribution of acquired or predicted data points and uniform distribution.Kernelized Stein-discrepancy S p; q ð Þ between given two distributions p and q is defined as follows, where y and y' are vectors in the target-property space, Þ, and k is a kernel function.In the BLOX method, the Gaussian kernel is employed as the kernel function, namely, Here, we consider the case that q is the uniform distribution (i.e.Ñ y log q y ð Þ ¼ 0) and then S p; q ð Þ can be written as

S p; q
where d is the number of the target properties.Let us assume that one has acquired data points V ¼ v 1 ; v 2 ; . . .; v N and predicted value of unknown data by an ML model as v new .One can evaluate the Stein discrepancy between the distribution of V and the uniform distribution as The BLOX method employs Stein novelty, namely Þ as the acquisition function.Because data points with large Stein novelty greatly deviate from the distribution of obtained data points, BLOX prioritizes exploring data points with large Stein novelties.
The BLOX does not require the uncertainty of the prediction unlike Bayesian optimization and thereby any regression method can be applied.We employ the random forest regression method [73,74] implemented in the SCIKIT-LEARN library [75] to construct ML models.We set the hyperparameter σ ¼ 0:1 in the standardized property space.

PTR
For simplicity, we begin by introducing a single-objective PTR function before moving on to the multiobjective case.Because the PTR function requires calculating not only prediction values but also the posterior distribution with the uncertainty of predictions, herein we employ the Gaussian process (GP) implemented in the PHYSBO code [39].In the PHYSBO code, the radial distribution function is employed as the covariance function (or kernel function) given by where x and x 0 are data points in the descriptor space, η is a hyperparameter.Note that this function is exactly the same as Equation 2ð Þ, but defined in the descriptor space.When the descriptors and objective properties of known data points are denoted as x 1 ; x 2 ; . . .x N f g and y 1 ; y 2 ; . . .y N f g respectively, the GP model estimates the expected value μ x ð Þ and standard deviation σ x ð Þ of posterior distribution at descriptor x as where s is a hyperparameter, I is the identity matrix, y = y 1 ; y 2 ; . . .
Þ� T , and K is a kernel matrix denoted as The posterior distribution of the GP model is given as In this study, the hyperparameters η and s are always optimized by maximizing the type-II likelihood [76] when the GP regression is performed.
Using the posterior distribution expressed as Equation 7ð Þ, the probability that target property y x ð Þ at descriptor x falls within the target range can be expressed as where y lower and y upper are the predetermined criteria of a target range and Φ is the cumulative distribution function of normal distribution, respectively.Note that if the criteria consist only one of y lower or y upper , namely the desirable condition can be expressed as y lower � y or y � y upper , one can set y upper ¼ þ1 or y lower ¼ À 1 as performed in Ref [62].
Next, let us consider the extension to multiple objectives, namely when there are two or more target properties y 1 ð Þ ; y 2 ð Þ ; . . .; y d ð Þ and we desire data points within a region in the target property space satisfying simultaneously.Herein, we construct single-objective GP models for d properties and employ the product of the singleobjective PTR function, as proposed by Iwama et al. [63], namely where are ith target property at descriptor x, expected value and standard deviation estimated by a single-objective GP model for the ith target property, respectively.

Performance test
To assess the performance of the developed system, we constructed a workflow of first-principles calculations to evaluate band gaps and dielectric constants for the identification of promising high-k materials.
We applied two ML methods described above, namely BLOX to estimate the deviation of materials in the property space and PTR with a criterion that the band gap and dielectric constant are larger than 4 eV and 30, respectively.Since high dielectric constants from DFPT calculations may possess large numerical errors [77], we used the logarithms of dielectric constants for the ML procedure.We used 182 descriptors implemented in the MATMINER code [78], as shown in Table 1.We note that all the descriptors are determined by only their chemical formula but are still unique in the search space because candidate materials for performing first-principles calculations are unique in their compositions as described below.
To evaluate the band gaps and dielectric constants, we constructed a workflow that combines OptimizeFW, HSEBSFW, and DFPTFW, which are implemented in the ATOMATE code for the structure optimization, band structure calculation by using the Heyd−Scuseria −Ernzerhof (HSE06) hybrid functional [84,85] and DFPT calculation [86], respectively.The overview of the workflow is shown in Figure 3.We customized OptimizeFW and DFPTFW to use the Perdew−Burke −Ernzerhof functional tuned for solids (PBEsol) [87].The PBEsol functional can reproduce the experimental lattice constants of a number of materials more accurately than the PBE functional [88], which is implemented as the default of OptimizeFW and DFPTFW, according to a recent benchmark of binary oxides [89].Since the computational evaluation of dielectric constants greatly depends on the convergence of the structure [90], we tighten the parameter concerned with the convergence of OptimizeFW, namely EDIFFG = −0.005.Band gaps are calculated by HSEBSFW, generating k-mesh by the PYMATGEN code based on the space group determined by the SPGLIB code [91].We note that the PBE and PBEsol functionals severely underestimate the band gaps for many materials [92,93], while the HSE06 functional is computationally demanding but can estimate the band gaps more accurately.Dielectric constants are calculated as the spherical average of the summation of electronic and ionic contributions to dielectric tensors.All the calculations are performed by employing the projector augmentedwave (PAW) method [94] as implemented in the VASP code [67,68] version 6.2.0.The version of used PAW pseudopotentials is PBE_5.4.
For our search space of candidate materials for first-principles calculations, we retrieved a total of 4099 oxides and chalcogenides from the Materials Project Database [8].These candidates satisfy the following conditions; (1) they contain at least one of O, S, and Se; (2) they do not contain elements F-Ne, Cl-Ar, Br-Kr, I-Xe, Po-Rn, Pr-Lu, and Ac-Lr; (3) their space group is not P1; (4) they are non-magnetic; (5) their primitive cells contain 40 or fewer atoms; and (6) they are stable against competing phases.The condition (6) ensures that the candidate materials have unique compositions.
These first principles calculations and ML were performed on LINUX 36-core nodes (two Intel Xeon Gold 6240 processors), with 192 GB RAM per node.We assigned 14 nodes for the first-principles calculations and 1 node for ML.As a baseline, we also performed the same high-throughput firstprinciples calculations on 15 nodes with random priority.We used the qlaunch command implemented in FIREWORKS to submit jobs while keeping the number of jobs in the queue system to 20.
Regarding the initial data of ML, we initially assigned the same random priority as the baseline random calculations and started ML procedures only after at least two data are obtained.To mitigate the effect of the bias of random numbers, we performed three trials with different random seeds.

Results and discussion
The resultant distribution of acquired material properties by BLOX is shown in Figure 4.As can be seen, using BLOX, the convex hulls of data points are spread out (i.e.deviated data points are identified and the distributions extend) faster than the random Table 1.Descriptors used in this study.All the descriptors are obtained from the compositional descriptors of MATMINER, and the names of the classes implemented in MATMINER are given.The details of these descriptors are written in the document of MATMINER.
Particularly, BLOX tends to identify materials possessing narrow band gaps and extremely high dielectric constants at the early stages.While it is well known that there is an inverse relationship between the band gap and the electronic contribution of the dielectric constant [95], the developed system is not designed to incorporate such physics and chemistry.Therefore, this result implies that BLOX facilitates knowing the limitation of the distribution of material properties without any prior knowledge of the target materials, properties, and computational methods.
We also note that such data points with extreme computational results often fail to accurately reproduce the experimental properties.For example, because of the inverse relationship between the band gap and the electronic dielectric constant, the computational electronic dielectric constant would be very sensitive to the numerical error of the band gap when the band gap is extremely narrow.We note that DFPT calculations are performed with PBEsol functional, which tends to severely underestimate band gaps and thereby may cause the aforementioned computational errors in the electronic dielectric constant.However, even if such extreme computational results are unreliable, the BLOX search would be still useful for the materials screening, because such discovery of anomalous data points at the early stage of materials search would be useful to modify the search space and/or the computational workflow (e.g. one can modify the computational workflow to improve band gap calculations by the use of other functionals, more strict criterion of convergence and finer k-mesh, and/or to remove the data from ML data when the band gap is very narrow).
The area of the convex hull and standard deviations of the band gap and dielectric constant are shown in Figure 5.After several hours, BLOX significantly increases the area of the convex hull compared to the random cases.The increase in the standard deviation of the dielectric constant is considerably more significant than that of the band gap.Therefore, the increase in the area of the convex hull is mainly ascribed to the dielectric constant, as also indicated in Figure 4.However, the standard deviation of the band gap is also surely increased by the BLOX search, indicating that BLOX does not only search materials possessing narrow band gaps.
The distribution of material properties acquired by PTR is shown in Figure 6.As expected, materials acquired by the PTR search tend to possess wide band gaps and high dielectric constants compared to the random search.The numbers of obtained target materials by the PTR and random searches are shown in Figure 7.One can see that the PTR search identifies the target materials approximately five times faster.
Finally, we discuss the relationship between the number of obtained data and system running time.The relationship between the number of all calculated materials and the system running time is shown in Figure 8.Although the number of computers used for the random search is more than that of BLOX, the random search is not faster than the BLOX search.Moreover, the speed of PTR is significantly slower than the others.This difference can be ascribed to the bias of acquired materials by each black-box optimization method.For example, the constituent atoms of materials possessing high dielectric constants tend to possess large principal quantum numbers [77], which results in a tendency to increase the number of electrons, thereby prolonging the required time for first-principles calculations.Actually, we found a positive relationship (r = 0.55) between the principal quantum numbers and the number of valence electrons calculated explicitly in the PAW dataset for each atom used in this study.This implies that the performance measured by the number of data points and realtime do not necessarily match in the real exploration, although most of the benchmark of black-box optimization methods is measured by only the number of acquired data points in the field of materials informatics.

Conclusion
In this study, we implemented ML-ATOMATE code based on the existing ATOMATE library and developed a system for autonomous materials screening by first-principles calculations and machine-learning techniques, running on HPC systems.The developed system integrates a single machine learning node determining priorities to calculate candidate materials.To demonstrate the effectiveness, we prepared 4099 oxides and chalcogenides as the search space, constructed a workflow to calculate the band gap and the dielectric constant, and applied two black-box optimization methods, namely BLOX and PTR.The performance is measured by the real system running time.The developed system successfully accelerates the exploration of materials possessing anomalous and desirable properties utilizing the BLOX and PTR, respectively.

Figure 1 .
Figure 1.Overview of typical high-throughput first-principles calculations on an HPC system using a framework provided by the atomate code.

Figure 2 .
Figure 2. (a) Overview of the developed autonomous materials screening system.(b) Workflow of the ML node.

Figure 3 .
Figure 3. Workflow of the first-principles calculations to evaluate band gaps and dielectric constants, constructed within the ATOMATE framework.

Figure 4 .
Figure 4. Distribution of band gaps and dielectric constants of calculated materials after the system runs for 24 hours, 72 hours, 120 hours, and 168 hours.The cyan and orange points show the data points obtained by BLOX and random priority as the baseline, respectively.The solid lines show convex hulls of the obtained data points.

Figure 5 .
Figure 5. (a) Area of convex hull and standard deviation of (b) band gaps and (c) dielectric constants of the acquired data points against the system running time.The solid lines and shaded regions show the average and standard deviation of three trials.

Figure 6 .
Figure 6.Distribution of band gaps and dielectric constants of calculated materials after the system runs for 24 hours, 72 hours, 120 hours, and 168 hours.The purple and orange points show the data points obtained by PTR and random priority as the baseline, respectively.The grey shaded regions show the predetermined target regions of the property space, i.e. the band gap >4 eV and the dielectric constant > 30.

Figure 7 .
Figure 7. Number of target materials against the system running time.The solid lines and shaded regions show the average and standard deviation of three trials.

Figure 8 .
Figure 8. Number of all calculated materials against the system running time.The solid lines and shaded regions show the average and standard deviation of three trials.