Accurate automatic classification system for 3D CT images of some vertebrate remains from Egypt

Vertebrate fossils/remains became recently significant in various study fields for determining the ecological biodiversity. However, with the great abundance of fossils/remains and their classes, there is a difficulty in identifying and detecting these classes. Hence, in this paper, an accurate machine learning classification technique is presented to differentiate automatically some types of 3D vertebrate remain images. A computed tomography (CT) scanner is utilized to construct a dataset of 3D images of some vertebrate remains found in Egypt. Adaptive enhancement and segmentation processes are applied to the dataset. The different selected geometric features are then extracted. Thus, the extracted features are classified using suitable machine learning classifiers (SVM, KNN, DTs). The automatic detection for the remains class, according to the extracted features, is obtained using the confusion matrix for the training and testing data points and the receiver operating characteristic (ROC) curve. The results confirmed an accurate technique with high performance.


Introduction
Lately, paleontology is becoming a significant tool for obtaining qualitative and quantitative information about fossils and remains that enables an understanding of the evolution of the vertebrates [1,2]. These fossils include the remains of animals, bacteria, plants, fungi, and single-cell organisms that are preserved in the materials of the rocks [3]. Paleontology science plays a powerful role in detecting the extant and past biodiversity, describing the environmental changes in the past, studying the sensitivity of the ecosystem and detecting zoonosis [4]. In paleontology science, microscopic observations are considered among the accurate methods for the analysis and detection of the fossil types [5]. These analyses help to obtain geometric morphometrics and biological information about the tested fossils, but these may face great difficulty in determining fossil types [5,6]. This may be attributed to the differences in the abilities of examiners, especially the novice ones to distinguish the geometric morphometrics features of fossil skeletons [7,8]. In addition, the techniques that are used to perform the biological analysis are considered expensive and may not be available in all countries [6]. Consequently, it is important to introduce digital image processing and machine learning techniques to perform accurate and automatic classification for fossil types and for the digitalized library propose.
Digital image processing is a technique used to implement some statistical or mathematical operations on an image to retrieve and analyse some meaningful information from it or obtain an enhanced image [9,10]. The digital image processing techniques have many applications in different fields such as remote sensing, atmosphere study, medical field, feature extraction and others [10]. Image processing includes two essential operations: image enhancement and image segmentation [11,12]. The image enhancement aims to remove noise (denoising), enhance contrast and improve image quality compared to the original one [11]. Various filters such as Winner filter, medium filter and average filter are applied to remove the unwanted noise from the captured image [13]. In addition, in this stage, image enhancement algorithms like histogram equalization and linear contrast stretching may be utilized to improve the contrast of the captured image [14]. The second operation that is implemented on the enhanced image is the segmentation of the image. Generally, image segmentation can be defined as the process of partitioning the enhanced image into many segments in which each segment belongs to a signal object and its pixels are connected [15,16]. The main goal of the image segmentation techniques is to find the optimal threshold (T opt ) value of the image and then convert it to a binary image (1 for white and 0 for black) according to this value [10,15]. Many segmentation techniques have been proposed to perform this task. These techniques include Otsu's method [16], k-means clustering method, region-growing method and fuzzy c-means clustering algorithm [10].
Machine learning (ML) is formidable technique that utilizes computational approaches to obtain the usable model from empirical data points [17,18]. The ML is considered a subfield of artificial intelligence (AI) [19]. ML techniques can automatically progress and learn from their acquired experience without being punitively programmed [19,20]. ML has various applications in many fields such as classifying and recognizing the different classes in the enormous dataset, concepting the phenomenon of cyber that generated the data points under investigation, utilizing the generated models for predicting the future data point values and revealing the exhibited anomalous behaviour from data points under investigation [17][18][19][20]. Generally, the machine learning techniques are divided into two main types: supervised and unsupervised [18]. Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features such that we can predict the output values for new data based on those relationships that it learned from the previous data sets [20,21]. They are used for predictive models and for labelled data. Regression and classification are the main types of supervised learning problems. The most commonly used algorithms are nearest neighbour, decision trees, linear regression, support vector machines (SVMs) and neural networks (NNs).
Unsupervised learning is mainly used in pattern detection and descriptive modelling [20]. These algorithms can try to model relationships between the output categories or labels, so, these techniques try to use processes on the input data to mine the rules, detect patterns, and summarize and group the data points. This helps the user to derive meaningful insights and describe the data [21,22]. The main types of unsupervised learning algorithms are k-means clustering and association rules [22]. One of the important digital image processing and machine learning applications occurs in paleontology [22][23][24][25][26][27][28].
The present study processed the 3D CT images of three vertebrate remains collected from Egypt, representing the skull of a crocodile from the ancient deposits of Lake Qarun of the Fayum Depression, the skull of a dolphin that died 3 years ago on the beach near the Damietta city and a fragment of hippopotamus maxilla (jaw) from the ancient delta deposits in Kafr El Sheikh governorate.

Problem formulation
The majority of the studies utilized the convolutional neural networks for performing their classification tasks on two-dimensional (2D) images. The usage of the 2D images gives restricted information about the structure of the acquired fossils/remains, which may lead to an inaccurate investigation of the fossils and remains [30]. In addition, the extracted features may be vague for researchers, which may influence especially time of need for some explanations according to the CNN. Owing to extracting the features occurs directly and automatically using the convolution layers of the network from the inserted image. It is to be noted that, there are few studies available in the literature that have studied this type of fossils/remains. Therefore, the main goal of the current study is to present an accurate classification system to differentiate automatically some types of 3D CT images for vertebrate remains/fossils. For this study, 3D dataset images of the remains under study are constructed using the CT scanner that provides more details about the structure of the remains. The digital image processing techniques are used to extract defined features for the remains and then send these features to some machine learning classifiers. To evaluate the efficiency of the automatic classification system, the F1 score and the receiver operating characteristic (ROC) curve measures are determined.

Proposed method
A method is suggested to detect accurately some vertebrate remains types using a classification system. A schematic diagram for the proposed method is shown in Figure 1 that is performed through the following steps.

Dataset establishment (Data acquisition)
In this work, a computed tomography (CT) scanner of version 128 Slice GE Optima is used to generate a 3D image dataset for three classes of remains, which are a crocodile head, a dolphin head and a part of hippopotamus maxilla (see Figure 2). During a CT scan, each vertebrate remain is mounted on a table inside the tunnel of the CT machine and a series of X-rays images were taken from different angles. This procedure is repeated on different vertebrate remains. Each 3D image in this dataset is acquired by recording a 1D projection at different angles from 1 o to 360 o . Then, reconstructed 3D images from these projections using the filtered back projections technique [31]. To our knowledge, this is an early established dataset for these types of remains. This dataset includes (2052) 3D images for three main classes: crocodile head (820 images), dolphin head (653 images) and a part of a hippopotamus maxilla (579 images). These images have a resolution of 512×512×3 with a 24-bit colour depth. The dataset images are divided into 1912 images for training and 140 images for the evaluation of the results. Figures 3-5 show examples of the recorded 3D images dataset that includes the crocodile head, dolphin head and the hippopotamusmaxilla part, respectively.

Preprocessing process
In this step, the captured RGB image is converted into a greyscale image G(x, y) using the NTSC formula as described by the following equation [10] G(x, y) = 0.299 * R + 0.587 * G + 0.114 * B (1) where R, G and B represent the channels of red, green and blue colours, respectively, in the image. Then, the histogram equalization technique is applied to the obtained image in Equation (1) to improve its contrast as given in the following equation [10] where m is the intensity range of the G(x, y) image, f i presents the frequency of intensity i, f refers to the sum of all the frequencies and Z presents total grey levels of the output image. To remove the unwanted noise (denoising), the Wiener filter (W u (u x , u y )) is applied as shown in the following equation [10,13]: where S(u x , u y ) and S * (u x , u y ) are the noise function and its complex conjugate in the frequency domain(u x , u y ), respectively, and .. and β s are the spectral strength for the unwanted noise and signal, respectively.

Segmentation process
To extract the needed properties from the dataset images, the binary image is required. To perform this task, we can use Otsu's thresholding method as a suitable algorithm to find the global threshold (T opt ) value. This algorithm is based on finding the weighted sum of variances of the two classes as described in the following equation [16]: where w 1 &w 2 present the probabilities of the two classes separated by a thresholdT, and σ 2 1 &σ 2 2 refer to the variances of the two classes.
Here, the threshold that minimizes the intra-class variance of the two classes is considered the optimal threshold (T opt ). According to the value of T opt , the binary image can be obtained using the following relation [16]: G(x, y) = 1(white), for each pixel ≥ T opt 0(black), for each pixel < T opt (5)

Feature extraction and classification
The feature extraction process is considered key to any successful classification. This is because the extracted features are the main parameters to differentiate between the classes in the dataset images. Many types of features can be extracted from images according to the conditions of the study and the nature of the tested objects. These features include colour features, statistical features, geometric features and texture features [32]. In the present study, the geometric features would be recommended and more suitable for acquired vertebrate remains due to the ability for analyzing this data type with more details compared to the other types of features. This is because the classification of vertebrate fossils is based primarily on the geometric shape of the bones of these fossils, so the geometric characteristics are considered the dominant features to classify these fossils.
In this work, many geometric features are extracted from the obtained binary images in Equation (5) to perform a classification for the classes in our dataset.
These features include (1) the area of the skeleton that can be defined as the number of pixels in the skeleton, (2) the convex area that is the convex hull area enclosing the skeleton, (3) solidity that can be defined as the ratio between the area of the skeleton and the convex area of the skeleton, (4) major axis of the skeleton that is the endpoints of the longest line that can be drawn through the skeleton, (5) minor axis of the skeleton that is the endpoints of the longest line that can be drawn through the skeleton whilst remaining perpendicular with the major axis, (6) eccentricity that can be defined as the ratio between the length of the minor (short) axis and the length of the major (long) axis of an object, (7) the Euler number that is defined   as the total number of objects in an image minus its total number of holes, so we can use the Euler number to detect all holes and vacuoles inside the vertebrate fossils, which helps to raise the classification efficiency, and (8) the perimeter of the skeleton that is the number of pixels in the boundary of the skeleton [33] are taken into account for the features. Figure 6 shows a step-wise visual output for the extraction features. After extracting these features from the images in our dataset, the features are sent to some machine learning classifiers to train and test them to complete the classification process.

Results and discussions
According to the proposed method, the images in our dataset are converted to greyscale images using Equation (1). Then, to increase the dynamic range of the images of our dataset we applied the histogram equalization technique as given in Equation (2). Also, to remove the unwanted noise we applied the Wiener filter as expressed in Equation (3). After detecting the optimal threshold (T opt ) values for the images using Equation (4), the relation in Equation (5) is used for obtaining the binary images for our dataset. The geometric features of the remains in the obtained binary images are extracted using the designed Matlab program. Now, the extracted features become ready for the training and classification processes. Each ML algorithm is not run separately for each feature but for the whole extracted features simultaneously. The classification results are not a mean of the partial results; however, they are the result of running for all features simultaneously.

Classification process using the support vector machine (SVM) classifier
The support vector machine (SVM) is considered among the most robust supervised learning methods [17,34]. This classifier has a high capacity for learning and reducing error sources, driving to effectively differentiate the classes of data points, especially the imbalanced data [18]. The basic principle of this classifier is based on converting the recoded data points from the recording space into higher dimensional feature space using a non-linear mapping function and then generating boundaries (hyperplanes) with a maximum margin of separation for the data points classes [20]. The most common mapping function used in the SVM classifiers includes linear SVM, quadratic SVM, cubic SVM, fine Gaussian SVM, medium Gaussian SVM and coarse Gaussian SVM.
For the current dataset, SVM classifiers with quadratic, cubic and fine Gaussian kernels are used to detect the remains class in our dataset. These classifiers are optimized with cross-validation equal to 5-fold. Then, the data points (features) are trained via these classifiers. To determine the accuracy of the training process using these classifiers, the confusion matrix is used. Figures 7(I), Figure 8(I) and Figure 9(I) show the confusion matrix for the training dataset using the quadratic SVM, the cubic SVM and the fine Gaussian SVM classifiers, respectively. From these confusion matrices, the accuracy of training is calculated using the following equation [35]: where TP, TN, FN and FP are the true positive, true negative, false negative and false positive values,  respectively. The obtained results are given in Table 1. Also, the receiver operating characteristic (ROC) curve is used to assess the performance of the training process at all classification thresholds. Figure 7(II), Figure 8(II) and Figure 9(II) show the ROC curves for the training dataset using the quadratic SVM, the cubic SVM and the fine Gaussian SVM classifiers, respectively. Here, the ROC curves are plotted by finding the relation between the true positive rate (TPR) known as sensitivity and the false positive rate (FPR) known as specificity at all classification thresholds [35]. The obtained results in Figure 7(III), Figure 8(III) and Figure 9(III) illustrate that the points determined by TPR and FPF on ROC curves for the three SVM classifiers are at (0, 1) and the area under curves are found to be 1 and these results represent the ideal situation of the identification in which these classifiers have a specificity of 100% and a sensitivity of 100%. This indicates that these classifiers are optimum models for the identification of the present remains class.
After exporting and saving these classifier models, 140 images are used to test these models. The confusion matrixes for the tested dataset using the quadratic SVM, the cubic SVM and the fine Gaussian SVM classifiers are shown in Figure 7(III), Figure 8(III) and Figure 9(III), respectively. From these figures, the accuracy of testing for each classifier is calculated using Equation (1), and the results are listed in Table 1. Furthermore, the F1 score is calculated from the obtained confusion matrix, to assess the performance of the testing process for each model, using the following relation and the outcome results are given in Table 1. F1 score values are taken from 1 (the perfect case) to 0 (the worst one) [35].

Classification process using the K-nearest neighbour (KNN) classifier
The K-nearest neighbour (KNN) classifier is a nonparametric and laziest learning algorithm [34]. The fundamental behind this classifier is based on utilizing the distance functions and similarity measurements to find k-nearest data points in a data set via applying the majority voting over the classes of these data points [36,37]. The most common distance functions that are used in the KNN classifiers include the normalized Euclidean metric, chi-square, Minkowski distance, Manhattan distance and cosine distance [37]. According to the rule of thumb, the k-values are selected to be odd numbers [34]. For the current dataset, the extracted features of remains are trained using the fine KNN classifier. Here, the normalized Euclidean metric is used as a distance function with cross-validation folds equal to 5-fold. Figure 10(I, II) shows the confusion matrix and ROC curve for the training dataset using the fine KNN classifier. Using Equation (6), the accuracy of the training for this classifier is calculated from Figure 10 and FPF for the obtained ROC curve in Figure 10(II) is (0, 1), and this is a signal for this classifier is an optimal model for the identification. After exporting and saving this classifier model, the same small dataset is used to test the exported model. Figure 10(III) shows the confusion matrix for the tested dataset using the fine KNN classifier. From Figure 10(III), the accuracy of testing and the F1 score are calculated for this classifier system, and the results are listed in Table 1.

Classification process using the decision tree (DT) classifier
The decision tree is a supervised learning algorithm that used the tree structure to perform the classification process [37]. In DT, the tree is constructed from the data points of the dataset and then divided into smaller subsets (root nodes and leaf nodes) until no more dividing can be performed [38]. The main goal of the DT classifier is to generate a training model for predicting the classes in a dataset. To perform this task, we begin from the root of the tree and then the root attribute values are compared with the record's attribute [37,38]. On the premise of comparison, we keep track of the branch identical to that values and jump to the following node [38]. The most common DTs used in the classification process are fine tree, medium tree, and coarse tree.
In the same manner, fine DT and medium DT classifiers are used to classify the fossil classes in our dataset with cross-validation equal to 5-fold. The extracted features are trained via these classifiers. Figure 11(I, II) and Figure 12(I, II) show the confusion matrix for the training dataset and their ROC curves using the fine DT and the medium DT classifiers, respectively. The obtained ROC curves in these figures clarify that the point determined by TPR and FPF for the fine DT classifier ( Figure 11 (II)) is at (0.01, 0.98) and the point determined for the medium DT classifier (Figure 12(II)) is at (0.03, 0.95). This means that these classifiers have high accuracy but are not optimal for the present data. The accuracy rates of training for these classifiers are calculated using Equation (6) and the results are listed in Table 1. After exporting these classifier models, the same small dataset is used to test them. Figure 11(III and III) show the confusion matrix for the tested dataset using the fine DT and medium DT classifiers. From these figures, the accuracy of testing and the F1 scores are calculated for each classifier system and the results are reported in Table 1. To provide a broader view of performance for our dataset, the feed-forward back propagation neural network is utilized for training and classifying the considered classes. Hence, the result accuracy rates of the classification, the training time and the F1 score value for the ANN model are listed in Table 1.
For the abovementioned classifiers, we conducted a permutation between the extracted features to select the best features for the training process. So, the obtained results in Table 1 are not the whole evaluated results, but the best selected features for the training and tested features.
In general, Table 1 clarifies that the cubic SVM and fine Gaussian SVM classifier system based on the extracted geometric features have a high efficiency (highest testing accuracy) in classifying and detecting the remains class compared to the other classifier systems. In addition, F1 score values refer that the SVM classifier system with fine Gaussian kernel (F1 score = 93.66%) has the highest performance for classifying the remains class. Actually, the fine Gaussian SVM with 100% accuracy of training means that the model is overfitting; therefore, we will consider the cubic SVM to perform the required classification to avoid this overfitting. In other words, the SVM with fine Gaussian kernel is not outperforming compared to other approaches and its high accuracy result may be attributed to overfitting occurrence during training of the model. So, we will exclude it from our consideration. Hence, the cubic SVM classifier is considered the dominant classifier in our study.

Conclusions
An accurate classification system has been developed to differentiate automatically some types of 3D vertebrate remains images, which are a crocodile head, a dolphin head, and a part of the hippopotamus maxilla that are discovered in Egypt. The used images of the remains are 3D dataset images, which are constructed using the computed tomography (CT) scanner that can support more details about the structure of the remains. The preprocessing techniques are used to remove the noise and improve the contrast of the images. The histogram equalization technique and Wiener filter are used. Otsu's thresholding method is utilized to obtain the binary images for the constructed remains dataset. The suitable selected geometric features are extracted. Classification ML techniques based on SVM, KNN, NN and DT models are utilized to classify these remains classes automatically. Efficiency evaluation for these classification systems is performed via confusion matrix, F1 score value, and the ROC curve. The obtained results show that the SVM classifier system with a cubic kernel has the highest performance for classifying the fossil classes. In addition, it may be used as an efficient classifier tool for remains classification. Hence, the proposed method shall be used as an efficient diagnostic tool for geologists or paleontologists.