Machine learning of fake micrographs for automated analysis of crystal growth process

ABSTRACT Material informatics is being applied to crystal engineering, which is a core technology in electronics. Micrographs particularly provide important insights; however, they have not benefited significantly from material informatics because of the efforts required to acquire huge numbers of data. Herein, we propose a fast and automated analysis technique for micrographs showing the crystallization process of semiconductor thin films. We automatically generated fake micrographs and trained the crystal domain recognition capability on 10 different machine learning models. Experimentally obtained micrographs were analyzed using the developed model, which correctly determined the domain size and nuclei density. The activation energies required for growth and nucleation were determined from the lateral growth velocity and nucleation frequency, the variations of which were smaller than those measured by humans. Therefore, the proposed analysis framework not only reduces the time required to derive the crystal growth properties, but also enables a high accuracy without human subjectivity. GRAPHICAL ABSTRACT


Introduction
Crystal growth is a fundamental technology for various electronics with semiconductor thin films. To address the question of how to grow high-quality thin films, numerical research has been conducted to understand and control various crystal growth techniques. With the development of the information society in recent years, to add electronic functions to various items, there is a strong need for technology to synthesize semiconductor thin films on the applied insulators. Solid-phase crystallization (SPC) is the oldest and most representative synthesis method in which crystallization is induced by annealing an amorphous thin films [1][2][3][4]. Recently, SPC has been in the spotlight because it provides an extremely high carrier mobility of Ge-based materials [5][6][7][8], which are leading candidates for replacing Si. To understand and discuss the SPC of a system, it is important to observe the phase transition from amorphous to crystalline, determine the lateral growth velocity and nucleation frequency of the crystalline domains, and determine the activation energies and frequency factors [1][2][3][4]. These physical properties have been obtained by repeating the annealing of the samples and conducting ex-situ observations and manual analyses (i.e. a calculation of the crystal domain size d and nuclei density ρ) ( Figure 1) [8][9][10]. However, this process is time-consuming, laborintensive, and is inevitably subject to systematic errors among the measurers.
Materials informatics is an interdisciplinary field of machine learning (ML) and materials engineering and is a new approach based on experiments, theory, and computation [11][12][13][14]. In recent years, a significant amount of research has been conducted on the application of materials informatics to crystal engineering in the search for more efficient materials and processes [15][16][17]. For example, Bayesian optimization can solve the problem of a combinatorial explosion of candidate structural optimizations in thin-film material searches and crystal growth [18][19][20][21][22][23]. Furthermore, clustering based on the similarity of the material properties can narrow down the number of candidates because materials belonging to the same cluster are expected to react in a similar manner [24][25][26]. The ML model trained on large material databases is beginning to match experimental and theoretical values, which used to be determined by the naked eye based on expert knowledge [27][28][29][30][31]. First-principles calculations have dramatically reduced the computation time with the introduction of potential approximations [32][33][34][35] and artificial neural networks [36][37][38][39]. In material research using ML, the inverse problem approach is often taken [40,41]. Conversely, although the need for a fast analysis and subjective removal is high, the application of ML to micrograph recognition is still limited [42][43][44]. The main reason for this is that the collection of micrographs, which serves as training data, requires an enormous amount of effort.
In this study, an automated analysis technique was developed for SPC properties using fake micrographs automatically generated within a few minutes as ML training data and an in-situ annealing observation system ( Figure 1). Using the recognition of the SPC process of high-carrier-mobility Ge as an example, we demonstrated that ML can recognize crystal domains in in-situ micrographs through the learning of fake micrographs. The proposed technique not only reduces the time and human effort required to derive the SPC properties, it also enhances the accuracy by eliminating human subjectivity.

Sample preparation and observation
We prepared arsenic-doped amorphous Ge layers (thickness of 200 nm, deposition rate of 1.7 nm min −1 , and arsenic concentration of 1.2 × 10 20 cm −3 ) on SiO 2 glass substrates using a vacuum evaporation system (base pressure of 1 × 10 −7 Pa) [7]. Impurity doping in amorphous Ge has been applied to enlarge the domain size and improve the visibility of the domains in micrographs [6,7]. For ex-situ observations, the samples were loaded into a tube furnace (Koyo Thermo Systems Co., Ltd. KTF035N1, Japan) in N 2 ambient. After a certain annealing time, the samples were removed and evaluated using a differential interference optical microscope (Leica DM 2500 M, Germany), which clarifies the contrast between the crystalline and amorphous materials. For the in-situ observation, the samples were loaded into a thin furnace (Linkam 10042D, UK) in N 2 ambient. The furnace is equipped with a digital microscope (Keyence VH-5500, Japan), which has a long focal length, allowing for an observation of the samples through the furnace window. For both observations, annealing temperature, T anneal , was set to 350°C 375°C and 400°C to draw the Arrhenius plots, which can derive the activation energies in the SPC.

ML model implementation
We implemented ML models for domain recognition using PyTorch [45]. The ML models were based on the following pre-trained ImageNet models to facilitate the training convergence [46,47]. We used transfer learning method [48]: the final layer of the pretrained model was replaced by a fully connected layer with two outputs to predict the values of d and ρ. These values were scaled to the size of the micrograph after applying a sigmoid function. The learning rate was 1 × 10 −4 , and the other specific details were the same as Ref. [49]. These models were mini-batch trained, with a batch size of 64, and a maximum number of multipliers of 2, which will not cause a memory flow [50]. An NVIDIA Tesla P100 GPU was used. The maximum number of epochs for training was set to 50. The model was evaluated using a 5-fold cross-validation.

Generation of fake micrographs
Amorphous Ge layers were prepared on glass substrates and then annealed to induce SPC in a conventional tube furnace for ex-situ observation and in a heating chamber with a window for in-situ observation. Figure 2(a-c,d-f)show micrographs of the sample surface obtained through ex-situ and insitu observations, respectively. These micrographs show the time evolution of SPC at T anneal = 350°C. Depending on the annealing time, t, the micrograph transitions are as follows: (i) At the beginning of annealing (t = 0 h), the image is monochromatic because the entire sample surface is in an amorphous state. (ii) After incubation, circular crystalline domains appear at random (Figure 2(b,e)). Here, the contrast originates from the difference in light reflectance between amorphous and crystalline Ge. After sufficient annealing (t = 40 h), the entire sample crystallizes, which makes the micrographs monochromatic again (Figure 2(c,f)). The resolution of the micrographs is lower for in-situ observations than for ex-situ observations, which is a general trend owing to the specifications of the system. For ex-situ observations, differential interference microscopy can be used, which can enhance the contrast between the amorphous and crystalline phases. Conversely, for insitu observations, we used a digital microscope with a long focal length to observe the inside of the heating chamber through a window. Therefore, in-situ microscopy can automatically capture numerical images at arbitrary time intervals; however, their human analyses tend to result in large systematic errors in d and ρ because of low-resolution images. We conducted an automatic analysis through an image recognition using ML as follows. Classical rulebased algorithms (e.g. grayscale and edge detection algorithms) lack robustness and are inappropriate for noisy images [51,52]. Therefore, we considered the use of a more robust ML model (i.e. convolutional neural network) with fewer hyperparameters [53][54][55]. In general, the performance of ML is highly dependent on the number of training data [55][56][57]. However, preparing a large number of training data and annotating the true values individually is the most timeconsuming part of the ML workflow. To circumvent this process, we created fake micrographs from only two experimental micrographs. Figure 2(g,h)show examples of fake micrographs. Fake micrographs were generated by superimposing circles with the same radius cut from the crystal image (t = 40 h) shown in Figure 2(f) on the amorphous image (t = 0 h) in Figure 2(d). This method can maintain the image quality and noise level of actual in-situ images. The circles were allowed to overlap each other, which well reflected the experimental micrographs showing the collision of domains during the SPC process. We automatically generated fake micrographs by modulating the true d true and ρ true and randomly determined the domain position. In addition, we applied fake micrographs rotated and flipped vertically and horizontally for training as an augmentation [58], these geometric transformations of which did not change d true or ρ true . Figure 3 shows a flowchart of the learning and prediction. We attempted to recognize the domains of in-situ micrographs by training ML models on the fake micrographs generated (Figure 3(a,b)). The final goal of this study was to determine the activation energies required for growth and nucleation, which are important parameters for understanding the SPC. The samples were annealed at three different temperatures (T anneal = 350°C 375°C and 400°C) because the activation energies are derived from the temperature dependence of the lateral growth velocity and nucleation frequency obtained from the time evolution of d and ρ. It should be noted that d and ρ could not be defined from the micrographs of the fully crystallized sample (Figure 2(c,f)). Therefore, we defined the saturation values of d true and ρ true in the fake micrographs from the in-situ micrographs just before crystallization was completed. The spacing of the generation conditions was fixed at 1 μm for d true and 0.4 × 10 9 cm −3 for ρ true . The lower limit was not set to zero because d true and ρ true cannot not be defined if no domain exist. Considering the above, the generating condition range of the fake micrographs, d true and ρ true , and the range of the number of fake micrographs, N fake , were determined (Figure 3(c)).

Model training and validation
The proposed analysis framework consisted of two stages: (i) training of fake micrographs with ML models and (ii) prediction of in-situ micrographs using the ML models trained on the fake micrographs (Figure 3(d)). Since the advent of AlexNet [59], neural networks have deepened and have improved the recognition accuracy. However, because an increase in the number of parameters leads to an increase in the inference time, many researchers have proposed various architectures to achieve a high accuracy while reducing the number of parameters. We implemented ML models based on the leading convolutional neural networks VGG16 [60], ResNet [61], ResNext [62], MobileNet [63], and EfficientNet [64]. The proper number of training images is based on a trade-off between the accuracy and training time. Therefore, the architecture and N fake must be determined using the proposed analytical framework. We examined these in terms of the recognition accuracy and processing time.
For the loss function, we used the sum of the mean squared error for the normalized d and ρ: where d pred and ρ pred are the predicted values of d and ρ, respectively, as outputs of the fake micrograph input to the ML models. In addition, m is the batch size. Figure 4(a) shows the training and validation curves when N fake and the number of layers of the base ResNet are modulated. N fake = 3,200, 6,400, and 12,800 correspond to the generation of 5, 10, and 20 fake micrographs per pair of (d true , ρ true ), respectively. As the number of epochs increases, both the training and validation losses decrease. The increase in the number of layers and N fake also result in a lower loss. These results indicate that ML proceeds correctly without an overfitting.
The total processing time consists of the time required for the generation of fake micrographs, the training of the models, and the prediction of the micrographs. Considering that the generation time per image is sufficiently short (13.4 ms), a larger N fake is better, although it is limited by the data storage. The maximum prediction time was 30 ms per image for the ML models used in this study, as shown in the inset of Figure 4(b). The prediction time is also negligible because only approximately 100 micrographs are sufficient to derive the SPC properties [9,10]. Therefore, the total processing time is primarily determined by the training time. Figure 4(b) shows the training time for various ML models to achieve the best loss (the minimum value of loss in validation). The best loss was minimized in ResNet for the following reasons. In general, the prediction accuracy improves as the number of parameters increases and then begins to decrease because the influence of the model complexity becomes greater than the learning efficiency [65]. The newer models, focusing on the efficient processing of big data, have too few parameters for this task, resulting in a low prediction accuracy. Conversely, VGG16 also resulted in quite a low accuracy (best loss of 8.2 × 10 −2 for N fake = 12,800) because the number of parameters was too large. Although the training time generally increased with the number of parameters, VGG16 and ResNet exhibited a short training time for such a large number of parameters. This is because these models do not have a depth-wise convolution (DC) layer, which has been widely used in recent convolutional neural networks to reduce the number of parameters [66]. Thus, ResNet has an appropriate model complexity, which allows for a high learning efficiency. Because ResNet-152 requires a long training time, we chose ResNet-50, which has a relatively low best loss and a short training time. The analysis was then performed using the condition (i.e. ResNet-50 and N fake = 12,800). Figure 5(a,b) show that the predicted values of d and ρ for fake micrographs, d pred and ρ pred , agree well with d true and ρ true . The coefficient of determination (R 2 ) and root mean squared error (RMSE) values indicate that both d pred and ρ pred were obtained with high accuracy; however, the prediction results partially contained large error regions. Therefore, we analyzed the error factors in detail. Figure 5(c,d) show the degree of error in d pred and ρ pred relative to d true and ρ true , respectively. According to the results, the error trend is explained by dividing it into two regions: d true <5 μm and d true >5 μm. For d true <5 μm, the domain size is below the resolution of the microscope and is often misidentified as noise. Therefore, d pred is larger than d true . Conversely, ρ pred is 3.5 × 10 9 cm −3 regardless of ρ true . This is because the ML model learns to take the average value over the predictable range to keep the loss as small as possible. Therefore, these errors are reasonable and depend on the resolution of the microscope. When d true >5 μm, the error trend is highly dependent on ρ true . As ρ true increases, the error in d pred changes from negative to positive, whereas the error in ρ pred changes from positive to negative. This behavior reflects the fact that as ρ true increases, the domains of the fake micrographs partially overlap. The domain overlapping behavior also occurs in the SPC process because of the collisions of growing domains, and should make d pred large and ρ pred small. These errors are not specific to ML but is also inevitable in human measurements.

Implementation of the ML model for crystal growth
To demonstrate the proposed concept (Figure 1), we adapted the developed ML model to 236 in-situ micrographs for the SPC of Ge. For comparison, 21 ex-situ micrographs were taken and measured by humans to determine the physical properties when using a conventional method. Figure 6(a-c) show examples of in-situ micrographs for T anneal = 350°C 375°C and 400°C Although the resolution is low, the micrographs show that the surface morphology differs depending on the annealing. Figure 6(d,e) show the d and ρ values measured by humans (d human and ρ human ) and predicted by ML (d ML and ρ ML ), where d ML and ρ ML are the average values obtained from the random cropping (224 2 px, 50 times) of a micrograph (1280 × 800 px). For both ML and human data, d and ρ increase with an increase in the annealing time, which indicates the progress of SPC. A lower T anneal provides a larger d and lower ρ, which is the general behavior in SPC [4,5,10]. The results from humans and ML are in good agreement, where the differences are |d ML − d human |≤5 μm and |ρ ML − ρ human |≤1 × 10 9 cm −3 , corresponding to the relative errors of 13.8% and 6.7%, respectively.
To automatically derive the lateral growth rate and nucleation rate, we fitted d ML (t) and ρ ML (t) with the following sigmoidal function: where A, B, k, and t 0 are the fitted constants. For d ML (t) and ρ ML (t), the slope at t = t 0 , corresponding to the largest slope, was determined as the lateral growth rate and nucleation rate, respectively. Figure 6(f) shows Arrhenius plots of these values obtained from ML together with those obtained from a human. We fitted these data with a straight line and calculated the activation energies required for lateral growth and nucleation from the slopes according to the Arrhenius equation [1-  ]. The difference between the activation energies determined through ML and the human was ±7%. The fitting errors of the straight line were clearly smaller for ML than for the human, suggesting that ML predicted the micrograph with higher accuracy and avoided human subjectivity. Moreover, ML was able to complete the entire process, including fake micrograph generation, training, and analyses, within 1 h, which is approximately 1/100th of the time required by a conventional human approach. Therefore, the analysis framework developed in this study achieved a higher accuracy within a shorter time than a human evaluation while eliminating the subjectivity.

Conclusions
We propose a fast and automated image analysis technique for the crystal growth process, taking the SPC of high-carrier-mobility Ge thin films as an example. To eliminate the effort required to acquire a large number of training data, we automatically generated 12,800 fake micrographs in a few minutes, corresponding to the crystallization process. The ML of the fake micrographs correctly recognized their crystal domains and determined d and ρ despite their low resolution. The investigation of 10 ML models showed that ResNet-50 exhibited a relatively short training time (43 min) and a small RMSE (d:2.1 μm, ρ:0.93×10 9 cm −3 ). Experimentally, the SPC of Ge was performed at different annealing temperatures using an in-situ annealing observation system, which automatically captured 236 micrographs during the SPC. By analyzing the in-situ micrographs using the proposed ML algorithm, d and ρ at different annealing temperatures were determined in less than 1 min.
Although some errors were introduced depending on the ranges of d and ρ, the linear regions necessary for deriving the lateral growth velocity and nucleation frequency were obtained. The Arrhenius plots of these parameters determine the activation energies required for growth and nucleation, which are important physical parameters for understanding an SPC system. The physical property values derived from ML had less variability than those derived manually. Therefore, the present study not only reduced the time required to derive the SPC properties to approximately 1/100th that of the conventional method, it also improved the accuracy by eliminating human subjectivity. The development of various methods for image recognition of the crystal growth process based on the proposed concept is expected, which will lead to the engineering of crystals in the age of artificial intelligence.