Exploration of outliers in strength–ductility relationship of dual-phase steels

ABSTRACT To overcome the trade-off relationship between tensile strength and elongation of dual-phase steels, three exploratory techniques were utilized: Bayesian optimization (BO), BoundLess Objective-free eXploration (BLOX), and one-class support vector machine (OCSVM). The BO is an optimization method using Gaussian process regression, and the BLOX and OCSVM are designed for finding out-of-trend materials. Initially, a large number of synthetic phase distributions of ferrite and martensite were prepared by the Gaussian random field method. Feature importance analysis was then performed to extract microstructural descriptors used as explanatory variables in the explorations. The results revealed that the volume fraction of martensite, the higher-order principal component of the spatial correlation function, and the first principal component of the persistent homology are important in predicting the product of strength and elongation (TS × EL). In the three exploration methods, crystal plasticity finite element analyses with a ductile damage model were repeatedly performed to predict strength and elongation. The BO and OCSVM successfully found microstructures with TS × EL much higher than those of random search in a limited number of iterations. The microstructures discovered in the explorations indicated that two approaches to material design are effective in improving tensile properties of dual-phase steels: uniformly dispersing the fine hard phase and constructing a lamellar structure of the hard and soft phases. GRAPHICAL ABSTRACT


Introduction
Steel materials are used in various structures in our modern society, such as ships, aircraft, and other transportation equipment, buildings, and bridges, and are deeply related to our daily lives. These structures are becoming larger, lighter, and more complex to meet the social demand, and the required properties are becoming diverse. In the automotive industry, weight reduction and collision safety are strongly required from the viewpoint of global environmental protection and occupant safety. In order to achieve both weight reduction and collision safety, it is necessary to develop steel materials with high tensile strength (TS) and elongation at failure (EL). However, increasing TS generally tends to decrease EL, which is well known as a 'strength-ductility trade-off. ' To overcome this trade-off relationship, research and development of advanced high strength steels (AHSS) are actively conducted. Currently, a product of TS and EL (TS × EL) of 30 GPa·% or higher is recognized as an accepted benchmark for next-generation automotive steels [1]. To achieve this goal, transformation-inducedplasticity (TRIP) steels with retained austenite [2] and quenching and partitioning (Q&P) steels [3] are being developed as third-generation AHSSs. Dual-phase (DP) steels composed of soft ferrite and hard martensite phases are also one of the candidates for nextgeneration automotive steels. The DP steels have a history of roughly 50 years [4], but they still leave unsolved questions and have the potential to improve TS × EL at a low manufacturing cost.
The tensile properties of the DP steels are affected not only by the respective properties of the ferrite and martensite phases but also by the spatial arrangement of the two phases. Experimental studies have shown that the chain-like networked dual-phase structure improves the tensile strength compared to an isolated structure [5]. The strength is also enhanced by increasing the martensite volume fraction and grain refinement, but total elongation is less affected by grain refinement [6,7]. To achieve the goal of TS × EL ≥ 30 GPa·%, it is necessary to maximize total elongation as well as tensile strength. The EL is a sum of uniform elongation and local elongation. In general, uniform elongation plays a vital role in formability when processing sheet metal, while total elongation is a decisive factor in other processes such as bending and countersinking processes [8]. It has been reported that uniform elongation has a certain relationship with yield strength from a large amount of tensile experimental data on steel materials [9]. In contrast, local elongation does not correlate well with other tensile properties (yield stress, tensile strength, uniform elongation) and is one of the most difficult properties to predict. One of the reasons is that the local deformation is a process of void formation, growth, and coalescence in the later stage of the tensile test and is strongly affected by the microstructure. In the tensile tests of the DP steels, insitu observation [10,11], digital image correlation analysis [12][13][14], and X-ray CT [15,16] revealed that the distribution of the two phases with different hardnesses causes local heterogeneous stress and strain, resulting in cracking of martensite grains, decohesion at ferrite-martensite interface, and fracture within the ferrite grains. These fracture mechanisms change depending on the martensite volume fraction and grain size [4]. Therefore, in order to predict the elongation of DP steels, simple equations are not sufficient, and numerical analysis that takes into account strain localization due to microstructure and multiple fracture mechanisms is necessary. Recently, by incorporating the two-phase microstructure and the ductile fracture models based on these experimental results into finite element method (FEM), prediction of local elongation is becoming possible [17], but the detailed relationship with the morphology of microstructure and the product of TS and EL has not been clarified.
Against this background, our group aims to optimize the microstructures of DP steels by forward and inverse analysis of the structure-property linkage of DP steel. The strategy is shown in Figure 1. The strategy consists of developing a forward analysis method using a numerical model based on experimental observations, and proposing an inverse analysis method to optimize the microstructure by repeatedly performing the forward analysis. Our previous work has developed the forward analysis method to predict the void formation in DP steels by reproducing the polycrystalline structure and incorporating the ductile damage model into the FEM [18]. Although the computational cost is higher than conventional FEM, it is possible to perform several tens to several hundreds of calculations in a realistic amount of time with today's computational resources. Therefore, it is expected to find microstructures with higher TS × EL by performing calculations multiple times with randomly changing the microstructure using an efficient exploration method. Bayesian optimization (BO) is currently one of the most commonly used tools in optimization problems and is suitable for finding the optimal point for such computationally expensive functions. The BO has been used in a wide range of fields, including optimization of hyperparameters in machine learning models [19], and has also been used in the materials field for optimization of welding conditions [20,21]. From another point of view, since TS and EL of DP steels follow the so-called banana curve (TS × EL is constant), the problem corresponds to the search for microstructures with properties that deviate from the trend. Terayama et al. proposed boundless objective-free exploration (BLOX) as a method to search for out-of-trend materials, and successfully found light-absorbing molecules with high molar absorption coefficients [22]. Therefore, BLOX may help find outliers that deviate from the banana curve of DP steels. One-class support vector machine (OCSVM) is another method of outlier detection, which uses a support vector machine (SVM) [23], and has been applied to anomaly detection of electrocardiography [24], and structural health monitoring [25]. In materials fields, support vector machine has also been shown to be suitable for optimizing the conditions of additive manufacturing (AM) [26]. The number of trials is more limited for AM experiments due to the high cost of each trial, but by using SVM, they have succeeded in extracting the appropriate manufacturing conditions in only 11 trials.
To explore efficiently with a limited number of trials, it is generally necessary to select appropriate explanatory variables. As mentioned above, it is known that the volume fraction of the martensite phase and grain morphologies affect TS and EL in DP steels. Alaie et al. reported on the effect of martensite grain width on elongation by combining in-situ observation and finite element analysis (FEM) [11]. In addition, Abid et al. performed finite element analysis by changing the morphology of the martensite phase and pointed out that the larger aspect ratio of the martensite grain provides the higher uniform elongation (UEL) [27]. Therefore, it is necessary to consider the geometric shape parameters of the martensite phase as candidate explanatory variables. Recently, spatial correlation function [28] and persistent homology [29], which are more statistical descriptors for phase distributions, have also been proposed as effective descriptors to determine material properties. It has been reported that the spatial correlation function is useful in predicting the yield strength of dual-phase metallic composite [30]. The persistent homology has been reported to be helpful in characterizing amorphous materials and phase-field analysis results [31]. However, few studies have compared the spatial correlation function and the persistent homology in exploration or optimization problems. To extract appropriate explanatory variables, it would be necessary to statistically examine the importance of various descriptors for predicting TS × EL.
In this work, two existing methods and one new method were applied to explore the DP steels with high TS × EL: BO, BLOX, and OCSVM-based methods. The strategy is to create a large number of microstructures with a wide range of morphology in advance, and then search within the microstructural database by crystal plasticity finite element method (CPFEM) with ductile damage model. Since the effective explanatory variables in the exploration were not clear, various geometric parameters, topological quantities, and statistic values related to the phase distribution were prepared as candidates for explanatory variables, and descriptors with a larger impact on TS × EL were extracted by conducting importance analysis. Using the extracted descriptors, microstructures with higher TS × EL were explored by three methods. The differences in the exploration methods were also examined.

Database of synthetic phase distributions
The material investigated in this study is DP steel, which consists of ferrite and martensite phases. To create finite element models, it was assumed that there was no third phase, there was no chemical heterogeneity in each phase, and each phase was composed of multiple crystal grains. Under these assumptions, a two-dimensional (2D) digital image of two-color patterns was called a 'phase map' and was used to define the spatial distribution of each phase. In order to enable the exploration of a wide range of phase morphologies, it is necessary to create a large number of phase maps. The open-source software DREAM.3D [32], which is often used for the reconstruction of polycrystalline models, can reproduce a variety of microstructures by applying multiple filters, but the number of models generated is limited in terms of computational cost because it is based on Voronoi tessellation. Therefore, the Gaussian random field (GRF) method [33,34] with the advantage of faster computation was selected for the generation of the phase maps, rather than the point-based method such as Voronoi tessellation or the physics-based method such as the phase-field method. In addition, it was considered necessary to include anisotropic shapes and extreme geometries such as uniform and lamellar distributions in the phase maps to further expand the search range. For this reason, anisotropy and periodicity were introduced into the GRF method. The details are described hereafter.
The martensite volume fraction was determined randomly between 0.25 and 0.9. This range was determined with reference to the DP steels used in the experimental calibration of crystal plasticity parameters [18]. Considering the computational cost and the microstructure of the DP steels, the size of the phase map was set to 128 × 128 pixels. In this study, 1 pixel was equivalent to 1 µm. To change the periodicity of the phase maps, the size of the GRF was defined as 2 7 − m × 2 7 -n pixels, where n and m were integer values obtained by randomly sampling from an exponential distribution with a mean of 1.5 and rounding the values toward zero. The upper limit of n and m was set to 7. These n and m correspond to the number of subimages for tiling, which will be explained later. Under these conditions, the size of the GRF was determined to be a rectangle with each side length of 1, 2, 4, 8, 16, 32, 64, or 128. Figure 2 shows an example when a size of the GRF is 64 × 128 pixels. After determining the GRF size, uncorrelated Gauss random noise, U (z), was obtained by randomly sampling values from a Gaussian distribution with a mean of 0 and a variance of 1 (standard normal distribution) and placing them in the 2D space (Figure 2(a)). A weight function, ω(h) was then defined by a 2D probability distribution as shown in Figure 2(b). Although the isotropic 2D distribution was used as ω(h) in the original papers [33,34], it was defined by a twovariable Gaussian distribution with horizontal and vertical variances and a rotation angle in order to generate an anisotropic phase distribution. The two variances were randomly selected from a lognormal distribution with a mean of 20 and a variance of 10 2 . The rotation angle was determined from a uniform random distribution from 0 ° to 90 °. This procedure is expected to provide an anisotropic phase distribution where each phase is elongated in the major axis of the weight function. The rotation angle corresponds to the elongated direction of the phase map. The Gaussian random field, G(z), was calculated by convolution of the uncorrelated Gauss random noise, U (z), and the weight function, ω(h), assuming horizontal and vertical periodicity at the edges of the 2D space ( Figure 2(c)). The G(z) was binarized by setting a threshold to satisfy a predefined volume fraction of martensite ( Figure 2(d)). By tiling the binary image generated by the GRF method, a phase map with a size of 128 × 128 pixels was finally obtained (Figure 2(d)). The size of the phase map was fixed at 128 × 128 pixels regardless of the size of the GRF. By repeating the above procedure, a total of 100,000 phase maps were prepared and stored in a database.

Microstructural descriptors
Since the phase maps generated in the previous section are digital data of 128 × 128 dimensions, appropriate low-dimensional explanatory variables need to be extracted from the phase map for the exploration. These explanatory variables obtained from the phase maps are referred to as microstructural descriptors hereafter. The microstructural descriptors used in this study are summarized in Table 1.

Geometric descriptors
One of the most commonly used descriptors for quantifying the microstructure of DP steels is the volume fraction of martensite (VF M ). It has been widely reported that tensile strength strongly depends on VF M in DP steels. Also, the fracture mechanism of DP steels varies with VF M to martensite cracking and martensite-ferrite decohesion [4], which may affect the elongation to failure. In addition to VF M , the stress-strain curve changes significantly depending on the morphology of the martensite grains. This has been confirmed both experimentally and by numerical calculations. One of the reasons is that the stress concentration induced by martensite grains on the adjacent ferrite phase is dependent on the shape of the martensite grains. To consider the effect of stress concentration, the local curvature of martensite grains was calculated as shown in Figure 3(a), and the minimum value in the phase map was recorded as a microstructural descriptor. The anisotropy of the martensite shape also affects the stress-strain curve [17]. To account for the anisotropy, all martensite grains in a phase map were fitted into ellipses, and the mean value of the angle between the loading direction and the major axis was calculated (Figure 3(b)). Also, the square root of the area of the martensite grains was calculated to consider the fineness of the martensite grains. In addition, perimeter and circularity were extracted as typical parameters representing the geometry of martensite grains. These average values were recorded as descriptors.

topological descriptors
Recent studies have reported that the connectivity of martensite grains has a considerable effect on tensile strength. A chain-like networked dual-phase structure improves tensile strength compared to an isolated structure with the same VF M , without significant loss of elongation [5]. Topological parameters were considered useful for quantifying such connectivity. As typical topology parameters, 0th Betti number (β 0 ), 1st Betti number (β 1 ), and Euler characteristic (χ) of the martensite phase were computed using Matlab programs. The β 0 represents the number of connected martensite grains, and the β 1 represents the number of holes in the martensite phase. The Euler characteristic is the alternating sum of the Betti numbers, i.e. χ = β 0 − β 1 for the 2D models.

Covariogram
With the recent development of applied mathematics, more statistical analysis methods have been proposed to quantify the microstructural morphology. In this paper, three types of statistical analysis methods were examined: covariogram, spatial correlations, persistent homology. The covariogram was proposed by Kreb et al. [35] to quantify the banded structures of ferrite and martensite in DP steels. The principle of the covariogram is based on the covariance function proposed by Matheron [36] in the 1960s. The covariance function, C (X, r), is the probability that both extremities of vector r belong to phase X. Fixing the direction of r to be perpendicular to the loading direction, the evolution of C (X, r) as a function of the magnitude of r, is defined as covariogram. An example of a covariogram of martensite is shown in Figure 3(c). The covariogram oscillates with several periods when the phase distribution has a banded structure parallel to the loading direction. The mean distance between bands is estimated by the position of the first maximum peak (HM), and the band intensity is measured by the difference between the first maximum value and the first minimum value (BI). Random structures show lower BI but greater than zero. These BI and HM were used as the microstructural descriptors.

Spatial correlation
A more general method to quantify the spatial correlation of multiple phases in two or three-dimensional space is the n-point correlation function [37]. This is also known as n-point statistics. Recently, Kalidindi and co-workers have developed a framework for quantifying microstructures based on the n-point correlation functions [38]. It has been demonstrated that the original image can be reconstructed from the 2-point correlation function if the probability of being in a certain phase at all points is 0 or 1. In other words, it is possible to uniquely determine higher-order correlation functions (3- The definition of f r MM is exactly the same as the covariance function, C (X, r), described above, but it differs from the covariogram in that the f r MM quantifies the phase distribution regardless of the loading direction. According to the literature [39], the f r MM was calculated by the fast Fourier transform (FFT). Since the f r MM has the same dimensional information as the original image, principal component analysis (PCA) was conducted to reduce the dimension of the f r MM using the 100,000 phase maps created in the previous section, and the top six principal components (SC1, . . . , SC6) were used as the microstructural descriptors.

Persistent homology
The last descriptor is persistent homology. This is a recently proposed concept for quantifying topological features. While dilating or eroding the phase of interest, the dilated values when the connected components and holes of the phase appear (birth, b) or disappear (death, d) are computed. The plot of (b, d) is referred to as the persistent diagram (PD). Here, martensite was chosen as the phase of interest. Similar to the Betti numbers, the plot of the birth and death of the connected components is referred to as 0th PD, and the plot of the holes is referred to as 1st PD. An example of the 0th PD of the martensite is shown in Figure 3(e). The PD was computed by HomCloud [40]. There are two main problems in using the PD for statistical analysis. One is that data points near the diagonal in the PD have short lifetimes and are sensitive to the small perturbations of the digital images [41]. The other problem is that since the number of data points in the PD is limited due to the size of the image, the principal components obtained by PCA are excessively dependent on the positions of the data points. To overcome these two problems, the persistence weighted Gaussian (PWG) kernel [42] was applied to the 0th PD of the martensitic phase. The PWG kernel is a process that reduces the weights of data points close to the diagonal and applies Gaussian smoothing. There are three hyperparameters to be optimized in the PWG kernel: the constants C and p in the weighting function, and the standard deviation, σ, of the Gaussian smoothing. These hyperparameters were determined to be C = 0.01159, p = 2.618, σ = 2.248 by Bayesian optimization based on the results of the randomly conducted CPFEA described later. The objective function of the Bayesian optimization was defined as root mean square error of out-of-bag observations in comparing TS × EL values predicted by CPFEA and random forest regression using PD principal components after applying PWG kernel. The PD processed by the PWG kernel with these hyperparameters is shown in Figure 3(f). It can be seen that the data on the diagonal disappeared and the data points became blurred. This improves the stability of principal component transformation against the slight fluctuations of the image due to discretization. The top six principal components of the PD diagram processed by the PWG kernel were used as microstructural descriptors. The minimum, maximum, mean, and variance of all microstructure descriptions in the phase map database prepared in the previous section are listed in Table 1.

Crystal plasticity finite element method (CPFEM)
For a phase map of interest, a regular mesh was created with 128 × 128 8-node hexahedral elements (C3D8), and the phases were assigned to correspond perfectly to the phase map. The shape and orientation of the polycrystalline grains in each phase were determined by the previously developed method based on the anisotropic Voronoi tessellation and the random sequential addition algorithm [43]. Since the spatial distribution of the two phases is of interest in this work, the distributions of the grain size and the crystallographic orientation were fixed. The grain size distribution of the two phases followed a normal distribution with a mean value of 2 µm. The crystallographic orientations of the ferrite and martensite grains were sampled from orientation distribution functions obtained experimentally from the DP steels without a specific texture [18]. The tensile strength and elongation to failure were calculated by the crystal plasticity finite element method (CPFEM) with a damage model. A crystal plasticity model with 24 slip systems of the 101 f gh11 � 1i and 112 f gh11 � 1i was used to calculate the stress-strain curve. The shear rate on each slip system was phenomenologically modeled by a power law: where _ γ 0 is the reference shear rate ( _ γ 0 ¼ 10 À 3 s), n is the strain rate exponent (n ¼ 20), τ α is the resolved shear stress of slip system α, and τ α c is the critical resolved shear stress. To represent work hardening, the τ α c was defined as a function of slip-slip interaction: where τ α 0 is the initial CRSS, τ α 1 is the saturated CRSS, b α 1 is the initial hardening rate, and Γ s is the accumulated shear strain on all the slip systems and h αβ is the interaction coefficient of the slip system β over α and assumed to be equal to 1. The parameters of the CP model for the two slip system families were assumed to be the same. The material constants used in the CPFEM are listed in Table 2. These parameters were taken from a previous work [18], where the crystal plasticity parameters were calibrated from the stress-strain curve of the DP steel with a tensile strength of approximately 1000 MPa. The material used in the experiment had a relatively random phase distribution. To express ductile fracture behavior, a single scalar damage variable, D, ranging from 0 for the undamaged state to 1 for the fully damaged state was introduced into the CPFEM. Using the damage variable, the stress tensor is given by [44]: where σ is effective stress tensor, C is stiffness tensor, and ε e is elastic strain tensor. The damage variable was simply defined as a linear function of the equivalent plastic strain [45]: where � ε p is an equivalent plastic strain, � ε p 0 is the damage initiation strain, � ε p f is the strain at complete failure. The upper limit of 0.95 was defined for the convergence issues. The onset and critical damage variables for the two phases are shown in Table 2. The damage initiation strain (� ε p 0 ) was determined by fitting the equivalent plastic strain at damage initiation in previous experimental works [18]. Fracture strain (� ε p f ) in tensile testing of steels typically ranges from 0.4 to 1.3 in the literature [46,47]. Here, values of 1.2 for the ferrite phase and 0.5 for the martensite phase were used. Periodic boundary conditions were imposed on all pairs of faces of the model. The stress-strain curve was calculated by applying a horizontal displacement to the side surface of the model.

Selection of microstructural descriptors
In general, the number of evaluations required to find the global optimum increases exponentially with the dimension of explanatory variables. A total of 23 microstructural descriptors were derived in Section 2.2, but it is difficult to explore using all of them with current limited computer resources. In Bayesian optimization (BO), which is currently one of the most promising exploration methods, the number of explanatory variables is practically limited to 10-20 even for relatively simple calculations [48]. In order to extract descriptors important for predicting tensile strength and elongation, CPFEM analysis was performed on 100 randomly selected phase maps, and the results were subjected to random forest (RF) regression to calculate the feature importance of each descriptor. The RF is an ensemble machine learning method that builds many regression trees as a weak learner. The regression trees were constructed by a method based on chi-square analysis [49]. The feature importances of descriptors were quantified by the permutation-based mean squared reduction method [50]. It corresponds to the average value of the change in the mean squared error of the predictions when permuting out-of-bag data (data not used in training) of each decision tree. The descriptors with higher importance were used in the following exploration.

Bayesian optimization
Bayesian optimization (BO) is a method of efficiently searching for the explanatory variables that minimize (or maximize) the output of the objective function. The microstructural descriptors are the explanatory variables, the CPFEM is the objective function, and the output is TS × EL in this work. The BO was performed by the following procedure: The steps (ii)-(iv) were repeated until reaching a certain number of iterations. An example of determining the next search point by BO is shown in Figure 4. The exploration by BO is performed in the microstructural descriptors space. The white circles in the figure represent the results of the CPFEM simulation, the gray points represent the unexplored points, and the colored distributions represent the acquisition function calculated from the GPR model. The efficiency of the optimization generally depends on the choice of the kernel function of the GPR and acquisition function. The ARD Matérn 5/2 kernel [51], which is widely used in the literature of Bayesian optimization, was used as the kernel function. For the acquisition function, the expected-improvement plus (EI+) acquisition function implemented in Matlab Statistics and Machine Learning Toolbox was used to avoid falling into the local maximum of the objective function. The EI+ acquisition function modifies the kernel function when the same point is explored repeatedly [52]. The explorationexploitation ratio was set to the default value of 0.5. The number of iterations was set to 100. The first ten microstructures selected in a random search (RS) in Section 2.4.4 were used in the initialization step (i). In order to compare the three exploration methods avoiding the effect of randomness in the initialization step, identical microstructures were used in the initialization steps for the remaining two explorations described below.

Boundless objective-free exploration (BLOX)
The second exploration method is a recently proposed method called Boundless objective-free exploration (BLOX) [22]. This is a method of searching for outof-trend materials without setting an objective function. The exploration using BLOX was performed by the following procedure: The steps (ii)-(v) were repeated up to a certain number of iterations. Since machine learning with random forest (RF) is much faster than CPFEM, most of the computation time was spent on step (v). The CPFEM provides a small number of relatively accurate predictions of TS and EL, whereas the RF provides its trends from a large number of TS and EL data. An example of determining the search point by BLOX is shown in Figure 5. The Stein novelty quantifies how much the point deviates from the existing trend in the TS-EL space. The BO searches for the next search point in the space of explanatory variables, but the BLOX searches for the next point in the space of objective variables (TS and EL). The first ten phase maps selected in initialization (step (i)) are the same as the BO.

One-class support vector machine (OCSVM) based exploration
The BO has the advantage of searching toward higher objective functions within the explanatory variable space. On the other hand, the BLOX has the advantage of searching in the space of objective variables and has the disadvantage that it may search for a lower TS × EL because there is no objective function. In this paper, an OCSVM-based exploration method is proposed to exploit the advantages of both methods simultaneously. In the OCSVM classification, it is assumed that all data belong to the +1 class and only the origin belongs to the −1 class [23]. Then, SVM finds the boundary that separates the two classes, and the data belonging to the −1 class side is regarded as an outlier. The signed  distance between the boundary and the data point is called the 'classification score (CS)' [54]. An example of calculating the classification score is shown in Figure 6(a). A negative value represents an anomaly. To find a higher TS × EL value, a product of normalized TS × EL and classification score was defined as 'curiosity index (CI)' for the exploration: where the TS and EL of the unsearched points were obtained by SVM regression, mean TS � EL ð Þ is the mean value of TS × EL of all phase maps, and CS is the classification score obtained by applying OCSVM to the (TS, EL) space. The exploration procedure is summarized as follows: An example of the distribution of the CI calculated in step (iv) is shown in Figure 6(b). It can be seen that CI is more prominent in the upper right of the figure. The red point in the figure has the highest CI among the unexplored points and is the next search point to be explored. In BLOX, the point with the lower TS × EL can be chosen, but in the OCSVM-based exploration, the point with the higher TS × EL is determined thanks to the normalized TS × EL in Equation (6). The exploration may change depending on the phase map selected in the initialization of step (i). To avoid such an effect of the initialization, the ten phase maps used in the initialization were the same as the other two methods.

Random search
To identify the microstructural descriptors that have significant effects on the tensile properties, 100 phase maps were randomly selected from the database of phase maps, and the stress-strain curves were calculated by CPFEM. The scatter plot of the tensile strength (TS) and total elongation to failure (EL) is illustrated with the curves of TS × EL = 10, 20, 30, 40 GPa·% in Figure 7(a). Most of the search points showed values close to TS × EL = 10 GPa·%, which is the same as the material used in the calibration of crystal plasticity parameters. In this random search, no microstructure exceeded 20 GPa·% in 100 iterations. The relationships of TS, EL, and TS × EL to the martensite volume fraction (VF M ) are plotted in Figure 7(b-d). With the increase of VF M , TS increased, and EL decreased. This trend is consistent with the trend observed in experiments [55].
The coefficients of determination R 2 were 0.9185 and 0.7582, respectively. The values of R 2 did not change much when regressing with a quadratic  Tables 3 and 4, respectively. Figure 8 shows crystal plasticity finite element models, distributions of damage variables, and stress-strain curves of these points. The elongation was maximum when the martensite volume fraction was low (VF M = 0.332, Figure 8(a)). In this microstructure A, the fracture occurred mainly in the ferrite phase. As shown by the arrow in the figure, severe localization of damage was observed primarily in the ferrite grains sandwiched between two martensite grains along the loading direction. Such fracture behavior has been reported in in-situ observations during tensile testing of DP steels with similar VF M [11,56]. The ductile damage in the ferrite phase initiated from the early stage, and the slope of the stress-strain curve was lower than the elastic modulus. This microstructure also showed a maximum value for TS × EL (TS × EL = 18.4 GPa·%). In microstructure B with intermediate martensite volume fraction (VF M = 0.573, Figure 8(b)), the damage was observed in both ferrite and martensite phases. The martensite grains connected in the loading direction were subjected to higher stress than ferrite due to their harder properties, resulting in higher tensile strength than the microstructure A. The fracture occurred at the necked area of martensite, as indicated by the arrow. This is consistent with in-situ observations by phase-contrast  X-ray microtomography, which confirmed the nucleation and growth of microvoids located in necked martensitic grains [57]. The elongation was lower than microstructure A. The value of TS × EL of microstructure B (TS × EL = 17.9 GPa·%) was almost the same as that of microstructure A. The tensile strength was maximized in microstructure C, which has a high volume fraction of martensite (VF M =  0.847, Figure 8(c)). In this microstructure, the damage occurred almost exclusively in the martensitic phase. Lower elongation was observed due to the lower fracture strain ε f of the martensitic phase. The value of TS × EL of microstructure C was 11.7 GPa·%. As seen in the three examples, the failure locations in the numerical models are consistent with the experimental reports, which supports the validity of the calculations.

Microstructural descriptors important for predicting tensile properties
Using the results of 100 CPFEM calculations in random search (RS), the importance of each descriptor to TS, EL, and ES × EL was calculated by RF regression, respectively ( Figure 9). The coefficient of determination (R 2 ) is also described in the figures. For tensile strength, the importances of the three descriptors VF M , Betti1, and SC1 were significantly high. For elongation, PH1 was relatively more important than the others, but no significantly important descriptors were found. The R 2 of EL was lower than that of TS, indicating that the prediction of EL was more difficult than TS. For TS × EL, the feature importances of five descriptors, VF M , Betti0, SC1, SC6, and PH1, were higher than the others, exceeding 0.4. The R 2 value was between those of TS and EL. From these results, it is unclear how many descriptors should be used for the explorations. To consider the number of descriptors required for the prediction of each tensile property, the values of R 2 obtained from RF regression using the top N most important descriptors were investigated for different values of N (Figure 9(d)). Interestingly, TS could be predicted with good accuracy using only one descriptor, but for EL, R 2 did not saturate until approximately ten descriptors. For TS × EL, approximately five descriptors were sufficient to reach a saturated value of R 2 . Therefore, the five most important descriptors for TS × EL (VF M , Betti0, SC1, SC6, and PH1) were focused on. In order to improve the efficiency of the exploration in the next section, it is necessary to remove the highly correlated explanatory variables. Figure 10 shows the correlation between the five descriptors. As already reported [58], when there is a large variation in the volume fraction in the database, SC1 (the first principal component of the spatial correlation function) has a one-to-one relationship with the volume fraction.
Since VF M has a clear physical meaning, VF M was left as an explanatory variable, and SC1 was not used in the exploration. There is also a strong correlation between Betti0 (the 0th Betti number) and PH1 (the first principal component of the 0th persistence diagram). This is probably because both descriptors count the holes in the martensite phase. Since Betti0 is a discrete variable and PH1 is a continuous variable, PH1 containing more information was selected as the explanatory variable. In summary, three descriptors (VF M , SC6, PH1) were finally chosen as explanatory variables in the explorations in the next section. The distribution of the three descriptors of the phase map database is shown in Figure 11(a). To understand the physical meaning of SC6, PH1, a scatter plot of SC6 and PH1 in the range of VF M 0.4-0.6 is plotted with several typical examples of the microstructures in Figure 11(b). The random microstructure is located near the center point indicated by the cross mark. It can be seen that a larger SC6 means an elongated microstructure in the loading direction, and a smaller SC6 means an elongated microstructure perpendicular to the loading direction. Therefore, SC6 represents the intensity of the lamellar structure and its orientation. The PH1 expresses the fineness of the two-phase pattern because microstructures with a fine distribution of martensite phase are located at a higher PH1.

Exploration by BO, BLOX, and OCSVM
The results at 20 iterations for each exploration method are shown in Figure 12. In BO and OCSVM, the search points exceeded the curve of TS × EL = 20 GPa·% by the first 20 iterations indicating that the search efficiency is much higher than RS. The BLOX did not show a clear difference from RS at this early stage. Comparing the ten initialization points and the subsequent 11-20 iteration points, it can be seen that all three methods tried to avoid the initialization points. This is because the acquisition function of BO takes a large value at unsearched points, and BLOX and OCSVM aim to find points farther from the searched points in the TS-EL space. One of the differences between BLOX and OCSVM is that the Stein novelty in BLOX does not include TS × EL, while the curiosity index in OCSVM does. For this reason, BLOX searched in all directions from the points explored by RS. Another difference is that BLOX uses RF and OCSVM uses SVM as the machine learning to derive the TS-EL trend of the searched points. Since RF consists of a large number of decision trees, it excels at complex interpolation prediction but has difficulty in extrapolation prediction. On the other hand, the SVM used in OCSVM is a binary classifier, which makes predictions with relatively simple formulas and prevents overfitting. It is considered that OCSVM worked more effectively than BLOX due to such a difference in the machine learning methods.
The results of 100 iterations for each search method are shown in Figure 13. The search points of BO were most widely distributed in the TS-EL space among the three methods. In BO, the highest EL value (EL = 34.7%) was observed through all exploration methods. This microstructure is referred to as microstructure D. The maximum TS × EL value in BO was 29.72 GPa·% at search point E. This is much higher than the maximum value in RS (18.4 GPa·%). The maximum value of TS × EL in BLOX was not much different from that of RS. In RS, the points were distributed evenly along the curve of TS × EL = 10 GPa·% (Figure 7(a)). While in BLOX, the search points were  clustered in several regions. The microstructures with large EL and TS × EL are shown in F and G in the figure, respectively. The search points of OCSVM were biased to the high EL region, and few high TS microstructures were found. This is because BO suppresses repeated searches for similar microstructures by modifying the kernel function of the acquisition function, but OCSVM-based exploration has no algorithm to suppress such repeated searches. The maximum values of EL and TS × EL were both obtained in microstructure H. The EL value was 30.6%, which was lower than in BO, and the TS × EL value was 30.6 GPa·%. Among the three methods, only OCSVM found the microstructure with TS × EL > 30 GPa. The microstructure with the second-highest TS × EL in the OCSVM had a relatively good balance of TS and EL, which is referred to as microstructure I.
To compare the efficiency of the four explorations, including random search, the histories of the calculated TS × EL and its maximum value are plotted in Figure 14. The history of maximum TS × EL suggests that BO and OCSVM explorations have higher search performance than RS, and BLOX has similar performance to RS. The regression methods used in each exploration were GPR for BO, RF for BLOX, and SVM for OCSVM. These differences in regression methods may have caused differences in search performance. Especially, OCSVM exploration found the microstructure with the highest TS × EL in the early iterations. The TS × EL trace (Figure 14(a)) shows that OCSVM was always searching for high TS × EL. Therefore, OCSVM is the most suitable for discovering microstructures with higher tensile properties in a small number of iterations. On the other hand, if a large number of numerical calculations are allowed, BO is possibly more suitable than OCSVM. The BO continued to increase the maximum value of TS × EL by exploring a wide range of microstructures, while the OCSVM mainly explored the high EL region and the maximum value of TS × EL was not updated after 20 iterations. In the OCSVM search, the distribution of CI did not change significantly after 20 iterations, which may be related to the saturation of the search. To solve this problem in OCSVM, it would be adequate to add a penalty term to the CI that reduces the CI value when repeatedly searching for similar microstructures, similar to the exploration-exploitation ratio in BO. However, it would introduce a new problem of tuning the hyperparameter in the penalty term. Another approach to achieve such a rapid search would be to directly use the TS × EL values predicted by the SVM, rather than CI, to determine the next search.

Microstructures with high TS×EL
To consider the strengthening mechanism of the microstructures with high TS × EL, the calculated results of microstructures D-I found in the three explorations are shown in Figures 15-17, respectively. The microstructural descriptors and tensile properties for each microstructure are shown in Tables 3 and 4, respectively. To examine the effect of volume fraction   of martensite (VF M ), the TS and EL values calculated from the regression equation based on VF M (Equations (8)- (9)) are also listed as TS VF and EL VF in Table 4. Among them, microstructures D (Figure 15(a)) and H (Figure 17(a)) showed particularly high EL (EL > 30%). These EL values were much higher than EL VF , indicating that the total elongation was improved not only by the effect of the volume fraction of the martensite phase but also by its spatial distribution. The microstructures D and H consisted of a ferrite matrix with martensite grains dispersed at regular spacing. In these microstructures, a large number of deformation bands oblique to the loading direction occurred almost simultaneously and were distributed throughout the microstructures. This is in contrast to the microstructure found in RS (Figure 8), where one or two main deformation bands develop obliquely to the loading direction, leading to eventual failure. To examine the mechanism of the high elongation in more detail, the evolution of the equivalent plastic strain in the microstructure H was plotted (Figure 18). At applied strain of 10%, a relatively large plastic strain was generated in the ferrite phase sandwiched by martensite grains similar to the other microstructures, but due to the regular distribution of the two phases, small deformation bands appeared synchronously in the microstructure, and the plastic strains showed a zigzag pattern. The plastic strain developed keeping this geometrical pattern, and strain localization was suppressed until rupture. Thus, it was shown that the microstructure with regularly aligned hard grains improves elongation through a synchronized void growth mechanism. To experimentally prove the ductility improvement due to this mechanism, it is necessary to find the thermo-mechanical treatment conditions that uniformly disperse the martensite grains.
Microstructures E (Figure 15(b)) and I (Figure 17(b)) showed a well-balanced improvement in TS and EL. These microstructures had a lamellar structure elongated in the loading direction. In contrast to the lamellar structure found in RS (microstructure C, Figure 8(c)), the lamellar spacing was relatively uniform and fine. For all of the lamellar microstructures C, E and I, the EL values calculated by CPFEM were higher than the EL VF . The ratio of EL to EL VF was approximately 1.6 for microstructure C, but exceeded 2 for microstructures E and I. These results suggest that uniform, fine lamellae are more effective in improving the total elongation. In the microstructures E and I, damage occurred in both the ferrite and martensite phases and the fracture eventually occurred in the grain with the most favorable crystallographic orientation for plastic deformation. Since the ferrite-martensite interface is parallel to the loading direction, the lamellar structure has a smaller stress concentration than the other structures, which may have resulted in the unique tensile properties. The equivalent plastic strain distribution of the microstructure I is plotted in Figure 19. The microstructure with large lamella thickness (Figure 8(c)) showed brittle fracture    due to strain localization in a specific location, whereas the microstructure I with multiple layers of relatively thin lamellae showed high ductility due to the distribution of plastic strain throughout the microstructure. This approach of using a lamellar structure of soft and hard phases has already been realized in multilayered steels [59]. It has been experimentally confirmed that the fracture behavior of the hard layer clearly transitions from brittle cleavage to ductile shear when the layer thickness is reduced [60], which is consistent with the calculation results of this work. The multilayer steel sheets composed of a hard martensitic stainless steel and a soft austenitic stainless steel exhibited a good strength-ductility balance, with a tensile strength of 1.4 GPa and uniform elongation of 20% [61]. Microstructures F and G found in BLOX had a unique network-type distribution of martensite and ferrite phases (Figure 16), but the values of TS and EL were not much different from those of RS (Figure 8(a,b)). The reason for the lack of improvement in tensile properties is because intense stress concentration occurred at the wavy ferrite-martensite interfaces, leading to fracture.
In this work, two-dimensional microstructure models with a simple damage law were used. In a future study, three-dimensional (3D) models with the stress triaxiality dependence of the fracture strain will be used for optimizing the 3D microstructure. Focusing on the other tensile properties, yield stress and uniform elongation (or work hardening rate) play an essential role in sheet metal formability. Since there is also a trade-off relationship between yield stress and uniform elongation [62], the approaches to search for outliers using BLOX and OCSVM proposed in this study can be applied to such problems.

Conclusions
In this study, CPFEM with ductile damage model was introduced into the framework of various exploration methods: random search (RS), Bayesian optimization (BO), BoundLess Objective-free eXploration (BLOX). Inspired by the advantages of BO and BLOX, a new exploration method was proposed by defining a curiosity index from the classification scores of OCSVM. The following conclusions can be drawn: (1) The results of RS showed that tensile strength (TS) correlates with the volume fraction of martensitic phase (VF M ) and total elongation (EL) correlates with the inverse of VF M . The product of strength and total elongation (TS × EL) was relatively insensitive to VF M and showed large scatter. The maximum TS × EL value of the microstructures analysed in RS was 18.4 GPa.%. (2) The feature importance analysis revealed that the VF M , the 6th principal component of the spatial correlation function (SC6), and the first principal component of the persistent homology (PH1) are important in predicting the product of strength and total elongation (TS × EL) of dual-phase steels. The SC6 represents the intensity of the lamellar structure, and the PH1 represents the fineness of the two-phase pattern. (3) Using the above three important microstructural descriptors, explorations of microstructures were conducted by BO, BLOX, and OCSVM. The BO and OCSVM found microstructures exhibiting TS × EL much higher than those found in RS. The maximum TS × EL values in BO and OCSVM were approximately 30 GPa.%. The search performance of BLOX was comparable to that of RS. (4) The CPFEM results of the microstructures with high TS × EL values found in BO and OCSVM were investigated in detail. These microstructures were homogeneously dispersed fine martensitic particles or a lamellar structure of martensitic and ferritic phases. It was indicated that strain partitioning and reduction of stress concentration led to improved tensile properties in these microstructures. These findings are consistent with known findings in materials engineering. It was shown that the proposed methodology of inverse problem analysis can be used to derive the optimal microstructures for improving the tensile properties of DP steels.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Council for Science, Technology and Innovation (CSTI), Cross ministerial Strategic Innovation Promotion Program (SIP), "Materials Integration" for Revolutionary Design System of Structural Materials (Funding Agency: Japan Science and Technology agency).