A combined method of crater detection and recognition based on deep learning

The crater is one of the main obstacles that need to be avoided when Mars probe lands. In order to further improve the accuracy of crater detection, this paper proposes a combined detection method based on deep learning. Firstly, the random structured forest is trained offline to detect the edge information of craters. Secondly, according to the detected edge information of the crater, the candidate areas of the crater are determined with the morphological method. For the identified candidate areas of the crater, Alexnet network trained by transfer learning was used to identify crater areas. Compared with other methods, the proposed method has relatively good effect.


Introductions
The craters distributed on the surface of Mars are main obstacles for the spacecraft to avoid when landing. In order to avoid the probe falling into the crater during landing, effective methods are needed to detect and identify the crater areas during the landing process. For craters with a diameter of 1-512 km, they can be manually marked by images taken by orbiters (Wang et al., 2015). For the crater whose diameter is < 1 km, it is necessary for the detector to detect and evade independently when landing. Generally, optical image has a higher resolution than digital elevation model data. There are three kinds of crater detection methods based on optical image: unsupervised detection method, supervised detection method and combined detection method.
The unsupervised detection method determines the edges of the crater through the circular or elliptical features in the image, and detects craters using the related theories of image processing and target detection. The main methods include: Hough transform and its improved algorithm (Emami et al., 2019), genetic algorithm (Hong et al., 2012), radial consistency method (Earl et al., 2005) and template matching method (Cadogan, 2020;Lee et al., 2020). For example, one crater detection method based on morphological image processing method was proposed by Pedrosa et al. (2017). This method mainly includes three steps: firstly, the candidate regions were determined by the morphological method, and then the possible crater areas were found by removing noise; secondly, the template matching method based on fast Fourier transform was used to CONTACT Haibo Li haibo021@sues.edu.cn establish the association between candidate regions and templates; finally, probability analysis was carried out to determine crater areas. The advantage of unsupervised method is that it does not need to label a large number of samples to learn the accurate classifier. This kind of method can be used when the computing power of autonomous navigation system is limited. However, it is not robust in complex terrain detection. The supervised detection methods use the marked training data set in knowledge domain and machine learning methods to construct classifiers, such as neural network (Li & Hsu, 2020); support vector machine (Kang et al., 2019); AdaBoost method (Xin et al., 2017). These methods mainly focus on the classification part of crater detection, and do not take the confirmation of candidate crater areas into account. Urbach and Stepinski (2009) proposed the idea of searching candidate regions of the crater. The candidate regions of the crater are the regions that may contain craters in an image, which can be regarded as the combination of a pair of crescent shaped highlight regions and shadow parts. In the work of Urbach and Stepinski, a small feature set was used to describe the highlight shape and shadow areas of the candidate craters. When there are non-crater landforms with similar shapes in the detection area, this method is not suitable for detection. In reference (Ding et al., 2011), an integrated framework for crater detection based on boosting and transfer learning was proposed. In this framework, they use the idea of searching candidate regions of the crater proposed by Urbach and Stepinski to extract image texture features from the candidate areas of the crater.
Then, several supervised learning algorithms were used to select a group of features to confirm whether the candidate region is a crater or not.
The combined detection method uses a variety of methods to detect craters, including supervised detection method and unsupervised detection method. For example, a combined detection method proposed by Yan et al. (2019), KLT detector was used to extract candidate regions of craters. In this method, image blocks were used as the input of the supervised detection method, and the detection accuracy was greatly affected by the parameter λ of the KLT detector. In Li and Hsu's (2020) work, a method combining template matching and neural network was used in the process of crater detection. The problem of this method is that the recognition effect of craters in rugged area was not good, because the recognition method based on neural network cannot remove a large number of false alarms. In addition, there are a lot of research on crater detection using combination methods, such as Sawabe et al. (2006).
In order to further improve the detection accuracy, this paper adopts the method of combination detection. In the framework of structure learning, the method of structural random forest was used to detect the edges of craters, and then the candidate areas of craters were determined by the morphological method. For the identification of candidate areas of craters, the recognition method based on deep learning Alexnet model was studied to confirm crater areas.

Edges detection of craters
Craters usually appear in the shape of a circle or an ellipse in images. For the detection of craters, the edge structure features can be fully used to determine the candidate regions. Therefore, edge detection is an important part of crater detection. Traditional edge detection methods are mostly based on colour gradient information (Canny, 1986;Freeman & Adelson, 1991;Ziou & Tabbone, 1998), but many edges do not correspond to colour gradient. For example, texture edge (Martin et al., 2004) and illusion contour (Ren et al., 2005). In this paper, the structure learning method was used to detect the edges of craters, and the problem of edges detection was transformed into predicting the local segmentation mask and corresponding edge map of a given input image patch. The overall scheme flow is shown in Figure 1.
It can be seen from Figure 1 that the image containing craters was firstly divided into image patches, then the image patches were classified and recognized by trained structured random forest, and finally, the edge information of craters was output.

Structured random forest
Structured random forest is an extension of random decision forest (Dollár & Zitnick, 2013;Geurts et al., 2006;Speybroeck, 2012). The method proposed by Dollár and Zitnick (2013) can only classifies labels, while structured random forest can classify 2D image patches. First, we review random decision forest. Decision tree f t (x) divides samples by recursive left and right branches x ∈ X. Each node j in the tree corresponds to a binary partition function related to the parameter θ j (1) If h(x, θ j ) = 0, the sample x belongs to the left, otherwise it belongs to the right. This process continues to reach the leaf node. Corresponding to sample x, the output of the tree is the prediction stored at the leaf node that sample x reaches. It can be a label y ∈ Y. For example, θ = (k, τ ), , and the two methods have been applied in practice (Criminisi et al., 2011). Decision forest is a set of single decision tree f t (x). For a single sample x, the output of all decision trees were given by a set model. The selection of set model depends on the output. The usual choice is voting and averaging. There are more complex set models (Criminisi et al., 2011). Leaf nodes of decision tree can store arbitrary information. For input sample x, the output of all decision trees was given by a set model. The prediction of multiple trees was combined in some way to give the final output.
In structured random forests, x ∈ X represents an image patch in the image to be detected, y ∈ Y represents the segmentation mask corresponding to the image patch.

Decision tree training
The training of each tree is independent. For the known node j and the corresponding training set S j ⊂ X × Y. The purpose of training is to determine the parameter θ j in the function H (x, θ j ). So, we need to define an information gain standard The selection of θ j is to maximize the information gain I j , and the data on the left S L j continue to recursively train the nodes on the left and the data on the right S R j continue to recursively train the nodes on the right. The training stop criterion is to reach the maximum depth or information gain, and the training set size reaches the set threshold.
For multi class (Y ⊂ Z) training, information gain can be expressed in the following form where H(S) = − y p y log(p y ), it denotes Shannon entropy, and P y denotes the proportion labelled y in S.
In structured random forest, in order to calculate the information gain, all labels Y are mapped to an intermediate space Z which is easy to measure the distance. Thus, the difference of y ∈ Y is transformed into the Euclidean space distance in Z. Specifically, y ∈ Y is mapped to a discrete label set c ∈ C, where C = 1, ... , k. Thus, the information gain can be approximately calculated. For example, the crater edge detection label y ∈ Y is a 16 × 16 segmentation mask, and z = (y) is defined as a long binary vector to determine whether each pair of pixels in Y belongs to the same segmentation. In space Z, the distance is easy to measure. In the 16 × 16 split mask, the dimension of Z is C 2 16×16 = 32640 dimensions. In order to reduce the dimension, random extraction method was used to get m-dimension data in application, which can not only improve the operation speed, but also ensure the difference of trees. Then, the m-dimensional data were divided into k classes by K-means clustering method, and mapped to the discrete label set C. On this basis, Shannon entropy can be calculated, and then information gain can be calculated by formula (3). In this paper, m = 256, k = 2. Because of the generality of edge detection, this paper uses the method of transfer learning to train decision tree on bsds500 dataset (Martin et al., 2001).

Edge detection of crater
For the input image, the first step is to fill the image so that it can be extracted 32 × 32 patch continuously. For each image patch, the multi-channel information were expanded to form a feature vector x ∈ R 32×32×k , and K is the channel. In this paper, 14 channel information is extracted, including 3 colour channels, 3 gradient amplitude channels and 8 directional gradient channels. The calculation of three colour channels was based on CIE-LUV colour space; the calculation of three gradient amplitude channels was based on Gaussian blur with σ = 0, 1.5, 5. For each gradient amplitude channel with σ = 0 and σ = 1.5, four directional gradient channels were divided based on the direction, and eight directional gradient channels were formed. In this paper, two kinds of features were used: pixel feature x(i, j, k) and pixel pair difference feature x(i 1 , j 1 , k) -x(i 2 , j 2 , k). By down-sampling the pixel features with factor 2, we get (32 × 32 × 14) / 4 = 3584 candidate features x(i, j, k). The triangle blur of 8 pixel radius is applied to each channel, and then the sampling resolution was reduced to 5 × 5, and the difference feature of pixel pair was calculated to form C 2 5×5 = 300 features, 14 channels, a total of 4200 features. Therefore, each image patch has 3584 + 4200 = 7784 candidate features. For the candidate features, partial subset were sent to the trained structured forest classifier, which can predict a 16 × 16 segmentation mask. For an image patch, the corresponding binary edge map can be obtained by dividing the mask (Dollár & Zitnick, 2013). As the decision forest leaf node can store any information, the corresponding binary edge image was stored in the leaf node. For the prediction results of multiple trees, the ensemble model of reference (Dollár & Zitnick, 2013) was used to merge, and the final edge map was obtained by combining the binary edge graphs to get the average.

Determination of candidate areas of crater
After the crater edge map was obtained by the method described in Section 2, candidate areas of the crater can also be obtained by morphological methods, such as randomly selecting a Google Mars surface image, as shown in Figure 2(a). The edge detection method described in Section 2 was used for image Figure 2(a), and the results are shown in Figure 2(b).
For the edge detection result shown in Figure 2, the structure element matrix B was used to expand, where The results of the expansion are shown in Figure 3(a). The adaptive optimal threshold contrast adjustment was used to make the edge form a connected region. The results are shown in Figure 3 The connected regions in Figure 3(b) were searched according to 8 connectivity. Each connected region was marked, and the small connected areas less than a certain threshold were removed. The boundary box of the marked areas was calculated. Generally, the edge of the crater is the periphery of crater, so the region in the bounding box can be used as the candidate region of the crater. Marking the boundary box at the same position in Figure 2(a), the result is shown in Figure 4.
For the candidate areas of the crater marked by Figure 4, it is necessary to further identify and eliminate the non-crater areas. The related contents are described in Section 4.

Crater recognition based on deep learning
For the candidate areas of craters determined in Section 3, a method for crater recognition by deep learning of alexnet model (Krizhevsky, 2012) was also studied. The modified linear units (Rectified Linear Uints, ReLU) were introduced into the alexnet model as the activation function, which has a faster convergence rate than simgoid function and tanh function. In addition, the alexnet model has fewer connections and parameters and it is relatively easy to train. The alexnet model has eight layers in total, including five convolution layers and three full connection layers, as shown in Figure 5.
Before identifying the candidate areas of the crater, the image needs to be preprocessed, and the size specification is converted to 227 × 227 × 3. Then, the craters were identified by using alexnet model. The identification process is mainly carried out according to the following steps.
Step 1. Convoluting the image of candidate crater region (227 × 227 × 3) and the convolution kernel (11 × 11 × 3), the convolution kernel performs the calculation in the X and Y directions of the input image. Each operation obtains a new pixel with a sliding interval of 4 pixels. After sliding, a 55 × 55 pixel layer was formed. Ninety-six convolution kernels were set in the first layer, and every 48 convolution kernels are a group. Therefore, two sets of 55 × 55 × 48 pixel layers are generated. The obtained pixel layer data is input to the correction linear unit, and the output is two groups of 55 × 55 × 48 active pixel layers. The active pixel layer is further pooled. The scale of calculation is 3 × 3 and the interval is 2 pixels. After calculation, 27 × 27 × 96 pixel layer is obtained. The pixel layer is standardized with a scale of 5 × 5. After processing, 27 × 27 × 96 pixel layer is obtained.
The pixel layer is convoluted with a convolution kernel of 5 × 5 × 48. The convolution kernel slides in the X and Y directions of the image to perform the calculation. In order to facilitate the calculation, two pixels are expanded around the image, and a new pixel is obtained each time. The sliding interval is 1 pixel. After sliding, a 27 × 27 pixel layer is formed. In the second layer, 256 convolution kernels are set, and each 128 convolution cores are a group, so two sets of 27 × 27 × 128 pixel layers are generated. The obtained pixel layer data are input to the correction linear unit, and the output is two groups of 27 × 27 × 128 active pixel layers. The active pixel layer is further pooled. The scale of calculation is 3 × 3 and the interval is 2 pixels. After calculation, two groups of 13 × 13 × 128 pixel layers are obtained. The pixel layer is standardized with a scale of 5 × 5. After processing, two groups of 13 × 13 × 128 pixel layers are obtained.
Step 3. Two groups of 13 × 13 × 128 pixel layers obtained by step 2 were convoluted with convolution kernel with a size of 3 × 3 × 256. Convolution kernel slides in X and Y directions to perform calculation. In order to facilitate the calculation, the image was expanded by 1 pixel each time, and a new pixel was obtained each time. The sliding interval is 1 pixel. After sliding, a 13 × 13 pixel layer was formed. In the third layer, 384 convolution kernels were set, and every 192 convolution kernels are a group, so two sets of 13 × 13 × 192 pixel layers are generated. The obtained pixel layer data are input to the correction linear unit, and the output is two groups of 13 × 13 × 192 active pixel layers.
Step 4. Two groups of 13 × 13 × 192 pixel layers obtained by step 3 were convoluted with convolution kernels of 3 × 3 × 192, and convolution kernels slide in X and Y directions to perform calculation. In order to facilitate the calculation, the image is expanded by 1 pixel each time, and a new pixel is obtained each time. The sliding interval is 1 pixel. After sliding, a 13 × 13 pixel layer is formed. In the fourth layer, 384 convolution kernels are set, and every 192 convolution kernels are a group, so two sets of 3 × 3 × 192 pixel layers are generated. The obtained pixel layer data are input to the correction linear unit, and the output is two groups of 3 × 3 × 192 active pixel layers.
Step 5. Convoluting two groups of 3 × 3 × 192 pixel layers obtained by step 4 with convolution kernel with a size of 3 × 3 × 192, convolution kernel slides in X and Y directions to perform calculation. In order to facilitate the calculation, the image is expanded by 1 pixel each time, and a new pixel is obtained each time. The sliding interval is 1 pixel. After sliding, a 13 × 13 pixel layer is formed. In the fifth layer, 256 convolution cores are set, and each 128 convolution cores are a group, so two sets of 13 × 13 × 128 pixel layers are generated. The obtained pixel layer data are input to the correction linear unit, and the output is two groups of 13 × 13 × 128 active pixel layers. The active pixel layer is further pooled. The scale of calculation is 3 × 3 and the interval is 2 pixels. After calculation, two groups of 6 × 6 × 128 pixel layers are obtained, and the total is 6 × 6 × 256 pixel layers.
Step 6. Convoluting the 6 × 6 × 256 pixel layer obtained by step 5 with the filter with the size of 6 × 6 × 256, and the calculation result is output through the neuron. In the sixth layer, 4096 filters and 4096 neurons are set up, and the filter convolution calculation results are output by these neurons. Four thousand ninety-six output values are generated by modifying the linear element, and 4096 calculated results are obtained by dropout calculation.
Step 7. Four thousand ninety-six neurons were set in the seventh layer and connected with 4906 output results of the sixth layer. After that, 4096 output values were generated by the modified linear unit, and 4096 calculated results were obtained by dropout calculation.
Step 8. One thousand neurons were set in the eighth layer and connected with 4906 output results of the seventh layer, and then the results of crater recognition were output.
In this paper, we train the alexnet model on the platform of MATLAB R2018a. The main configuration of the computer is Intel Core i7-4790 processor, NVIDIA geforce GT 705 graphics card. The training samples are obtained from Google Mars surface image, and the Mars surface image was used to make the sample library. The images containing different scale craters were taken as positive samples, and the surface images without craters were selected to make negative samples. One thousand positive and negative samples were selected for image training, and the learning rate is 0.01. Image enhancement strategies such as mirror image, flip and random clipping were used to verify the number of images in the verification set. The training process is shown in Figure 6.
As can be seen from Figure 6, after 20 iterations of training, the recognition accuracy (Validation) is 95.83%, which has high recognition accuracy.

Performance analysis of edge detection
The edge detection algorithm proposed in this paper is compared with the method proposed by Zhang et al. (2010), and a Mars surface image is randomly selected from the sample database, as shown in Figure 7. Zhang's method (2010) and the method of this paper were used to detect the edge of Figure 7, and the results are shown in Figure 8.
It can be seen from Figure 8 that Zhang's method (2010) is not as effective as the method proposed in this   paper, with more discontinuities and poor coherence. In order to further evaluate the detection results, the evaluation index of reference (Lin, 2003) was used for objective evaluation. The ratio of the number of eight connected components C to the number of edge points A was used for evaluation. The smaller the ratio, the better the detection effect is. The comparison results are shown in Table 1. It can be seen from Table 1 that the C/A value of the method in this paper is 0.0061, which is smaller than the value of 0.0584 in Zhang's method (2010), which indicates that the edge detection method proposed in this paper has a better effect.

Comparison with other methods
Comparing the method in this paper with Yan's method (2019), Yan's method is also a combined detection method. A test picture is randomly selected from the sample library as shown in Figure 9.
For the sample to be tested in Figure 9, Yan's method and the method of this paper were used to detect and identify crater in Figure 9, and the results are shown in Figure 10.
It can be seen from Figure 10 that both methods have detected three craters and identified them. In order to further evaluate the detection accuracy, the detected common crater areas are numbered 1, 2 and 3 respectively, as shown in Figure 11.
In order to further evaluate the detection accuracy, the IOU indicators were used to evaluate the areas of craters detected and identified by the two methods, and the results are shown in Figure 12.
It can be seen from Figure 12 that for the detected crater areas 1, 2 and 3, the IOU values of the areas detected by the proposed method are higher than those detected by Yan's method and the average IOU values detected by the proposed method are also higher than the average IOU values detected by Yan's method, which shows that the proposed method has relatively good effect.     In addition, 20 samples were randomly selected for detection and identification, and the correct detection rate proposed by Ali-Dib et al. (2020) was used as the evaluation index to compare the two methods. The comparison results are shown in Table 2.

Conclusion
In this paper, the detection of craters is mainly divided into three parts: crater edge extraction, candidate area determination and crater recognition. In the edge extraction part of the crater, the method based on structure random forest was used to extract the edge of the crater. Based on edge detection, a candidate region extraction method was studied by morphological methods. For the recognition of candidate regions, the recognition method was studied based on deep learning AlexNet model. Experimental results show that the edge detection method of the crater proposed in this paper is better than other edge detection methods; compared with other crater detection methods, the proposed method has relatively high detection accuracy and correct detection rate.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The work described in this paper was supported by the National Natural Science Foundation of China [Grants Number 61703270].