Transfer Learning Models for Land Cover and Land Use Classification in Remote Sensing Image

ABSTRACT Land Cover or Land Use (LCLU) classification is an important, challenging problem in remote sensing (RS) images. RS image classification is a recent technology used to extract hidden information from remotely sensed images in the observed earth environment. This classification is essential for sustainable development in agricultural decisions and urban planning using deep learning (DL) methods. DL gets more attention for accuracy and performance improvements in large datasets. This paper is aimed to apply one of the DL methods called transfer learning (TL). TL is the recent research problem in machine learning and DL approaches for image classification. DL consumes much time for training when starting from scratch. This problem could be overcome in the TL modeling technique, which uses pre-trained models to build deep TL models efficiently. We applied the TL model using bottleneck feature extraction from the pre-trained models: InceptionV3, Resnet50V2, and VGG19 to LCLU classification in the UC Merced dataset. With these experiments, the TL model has been built the outdate performance of 92.46, 94.38, and 99.64 in Resnet50V2, InceptionV3, and VGG19, respectively.


Introduction
Land cover is variable and dynamic on the earth's surface (Abdu 2019), whereas land use is the intervention of human activities on the earth. Land cover is the earth's surface covered by physical features like forest, river, vegetation, or others. In contrast, land use is the ability of a human to use natural resources for various purposes (Cao, Dragićević, and Li 2019). Thus, land cover and land use (LCLU) describe the earth's features and human interaction. Classification is needed in the land cover mapping Stivaktakis, Tsagkatakis, and Tsakalides 2019;Tong et al. 2020) and land use resource management (Cao, Dragićević, and Songnian 2019;Castelluccio et al. 2015;Hung, Wu, and Tseng 2020). LCLU classification is an important, challenging task (Huang et al. 2021), and it contributes to agricultural decision-making and urban forecasting on the earth observation environment for sustainable development. This classification problem will be solved using transfer learning (TL) models in remote sensing (RS) images.
RS images are earth observation geospatial data and records of environmental information. RS data face the "big data" challenges and some new challenges for DL as they raise exceptional problems to new scientific questions (X. X. Zhu et al. 2017). RS imagery data classification is a significant problem in various domains (X. Liu et al. 2019;Deng et al. 2019;Abdu 2019;Yao et al. 2019;Cheng et al. 2020;W. Zhang, Tang, and Zhao 2019;Shabbir et al. 2021;D. Zhang, Liu, and Shi 2020;Alam et al. 2021). Thus, our consideration of classification was one of the major research problems in RS imagery data. Nowadays, researchers explored the application of DL to confrontation these challenges.
Deep learning (DL) gets more attention for the LCLU classification problem in RS images (Vali, Comai, and Matteucci 2020). The DL approaches could extract the earth's features from remotely sensed imagery data to manage the earth's environment by properly utilizing deep classification systems. DL algorithms are calling focuses on their automatic learning ability from large datasets (Côté-Allard et al. 2019;Li et al. 2019;Deng et al. 2019;Das et al. 2019;Xie et al. 2016;Bahri et al. 2020;X. Pan et al. 2018;W. Zhao et al. 2015; X. X. Zhu et al. 2017;Yao et al. 2019;Rashid et al. 2020;Weinstein et al. 2019;Cheng et al. 2020).
In recent studies, the DL methods, especially CNNs, are widely used in RS image classification for their outstanding performance and accuracy (Marmanis et al. 2016;Das et al. 2019;W. Zhang, Tang, and Zhao 2019;Vali, Comai, and Matteucci 2020;Rashid et al. 2020). However, DL algorithms could take more time and complexity, creating over-fitting (Das et al. 2019;B. Zhao, Huang, and Zhong 2017;Rostami et al. 2019;Hung, Hui-ching, and Tseng 2020;Zou and Zhong 2018) when training the DL models from scratch. TL, the innovative DL model in machine learning, could overtake this problem because TL is an optimization technique used to thrift the processing time, superior performance, or accuracy (Kumar, Naman, and Verma 2021).
The TL models could be applied in various RS domains. For instance, it has been applied for forest variable estimation (Astola et al. 2021); for object (airplane) detection (Chen, Zhang, and Ouyang 2018;M. Zhu et al. 2019); for poverty mapping (Xie et al. 2016); for labeling the Synthetic Aperture Radar (SAR) (Rostami et al. 2019); for change analysis (Qian et al. 2020); and marsh vegetation classification (M. Liu et al. 2021).
Related work has been tried to investigate CNN-based TL model on the domain area. Few researchers (Mahdianpari et al. 2018;B. Zhao, Huang, and Zhong 2017;S. J. Pan and Yang 2010;Marmanis et al. 2016;Zou and Zhong 2018;X. Zhang, Guo, and Zhang 2020;Lima and Marfurt 2020;Shabbir et al. 2021) have investigated CNN-based models using pre-trained architectures on RS image classification. Using TL, the LCLU classification problem was investigated by (Naushad and Kaur 2021;D. Zhang, Liu, and Shi 2020) using TL. However, TL in RS has not been widely explored yet (Astola et al. 2021), especially in the LCLU classification. Thus, we applied deep neural networkbased TL (Li et al. 2019) in LCLU classification using RS image.
Our motivation was to apply the deep TL model with pre-trained models for the LCLU classification in RS images and improve the performance efficiently. We have listed the related papers with their recommendations in our previous work (Alem and Kumar 2020). Therefore, we also were motivated to investigate the recommended pre-trained networks suggested by (Marmanis et al. 2016;Stivaktakis, Tsagkatakis, and Tsakalides 2019).
Our objective in this paper was to apply the deep TL models and improve their performance efficiently for LCLU classification in RS images. To achieve this objective, we followed the procedures: preprocessed the UCM imagery data, extracted the image features using the bottleneck feature extraction technique, and modeled the TL with four sequential layers (flatten, dense, two activations (ReLu and softmax) and dropout layers), and evaluated using a confusion matrix.
Our contributions in this paper were: • Applying deep TL method on RS imagery data for LCLU classification problem; • performed LCLU classification problem in RS using three TL architectures (ResNet50V2, VGG19, and InceptionV3) with a bottleneck feature extraction technique on the UCM dataset, • evaluated and improved the performance of the TL models efficiently

Methods
This study proposed the Deep TL method, which is a deep CNN technique, for efficient time-consuming. Building the model for better performance uses various parameters, such as pre-trained models, learning rate, early stopping, dropout, optimizer, loss, and activation functions.
Pre-trained models have become recent applications in RS image classification problems (Risojevic and Stojnic 2021;Rashid et al. 2020;D. Zhang, Liu, and Shi 2020;Marmanis et al. 2016;Stivaktakis, Tsagkatakis, and Tsakalides 2019). The pre-trained CNN based (Kumar, Naman, and Verma 2021) TL models used in this paper included ResNet50V2 (He et al. 2016;Shabbir et al. 2021), VGG19 (Mateen et al. 2019;Simonyan and Zisserman 2015;Xiao et al. 2020), and InceptionV3 (Szegedy et al. 2015a(Szegedy et al. , 2015b. These pre-trained architectures are the deep CNN pre-trained models used to design a new TL model from the existing problem. Learning rate (LR) was used to facilitate the TL model learning from the UCM dataset. It has various values such as 0.01, 0.001, and 0.0001. However, if the larger TR is used, the fluctuation of training and learning could happen (Naushad and Kaur 2021). Therefore, the smaller TR value is advisable to be used in building DL models. So, we used the TR of 0.0001 in this paper to optimize our model.
Reducing over-fitting in the DL method is dynamic. Dropout and early stopping are the major parameters used for reducing overfitting when training data. The percentage values for dropout expressed in decimal forms are usually recommended to use 0.2, 0.3, 0.4, and 0.5. We used 0.5 (i.e.50%) to reduce the training over-fitting since higher dropout could perform better than lower values (Stivaktakis, Tsagkatakis, and Tsakalides 2019). Early stopping is a deep CNN regularization technique used to stop the training after random epochs when the model performance could not improve (Naushad and Kaur 2021).
In DL modeling techniques, classification loss functions are widely used. This classification loss could be binary cross-entropy or multi-class crossentropy. We preferred the multi-class entropy loss function since our class is multi-classes of the RS images.
Activation functions could be used afterward for each convolutional layer to raise the neural network capability (Yao et al. 2019). ReLu (Nair and Hinton 2010) and softmax (Mahdianpari et al. 2018) activation functions were applied in this paper because they are more advantageous than other conventional nonlinear functions, such as tanh and sigmoid functions. ReLu and softmax are better in their easily propagating errors; multiple layers of the neurons have been activated and simpler mathematical operation than that of tanh and sigmoid functions.
ReLu generates an output x if x > 0 or 0 if x < 0 as observed in (equation 1 and Figure 1a). This output implies that the neurons with the negative values are not activated except those with positive values. The slope of the gradient (derivative) value of ReLu is constant, i.e., ether 1 ∀x, x ≥ 0 or 0 ∀x, x < 0 (equation 2 and Figure 2b) .
Softmax (softargmax) is used to predict the class having the highest probability in multi-class classification problems for the input labels. We also used this function since our RS imagery data is a multi-class classification problem. The softmax output is between 0 and 1, and the sum of each class probability is 1.0. If some N elements of the input vector are N <0 or N > 1, they would be between (0, 1) after using the softmax function. Its equation f z i;j À � over N classes is computed in the equation (3) given.
In summary of the method, the parameters such as networks and weights were trained in the pre-trained InceptionV3, Resnet50V2, and VGG19 models. We used the bottleneck feature extraction method to extract image features from these pre-trained models. A fully connected network for pre-trained models was removed, and then a new model was built, and its weights were also removed. The bottleneck features, become the inputs for FC, are trained for UCM images ( Figure 2). The bottleneck feature extracted the features of shape (1264,6,6,2048) in training bottleneck prediction and shape of the features (420, 6, 6, 2048) on validation and testing bottleneck predictions extracted for each pre-trained model.

Experiments and Performance Evaluations
The experiments were performed on a laptop with the Intel Core i3-4000 M CPU 2.40 GHz 2.40 GHz RAM = 4GB and in Colaboratory with its GPU in Keras and Tensorflow software packages.

Experimental Setting
As we discussed in section 2 earlier, various hyperparameters do have important inspirations for classification problems. So, we have used some of the important parameters in our experiment listed in Table 2. In addition to using the dropout (0.5) layer, we used the early stopping technique to reduce the over-fitting.

Performance Evaluation Measurements
The model's performance is evaluated using a confusion matrix (CM) and overall accuracy (OA). CM analyzes errors and confusion between the column with the occurrences in a predicted class and the row with the occurrences in an actual class (W. Zhang, Tang, and Zhao 2019). So, CM could also be named as an error matrix since it classifies the error classification. The errors could be type I error and type II error (Table 3). A type I error is an outcome where the model incorrectly predicts the positive class when it is the actual negative value. In contrast, a type II error is an outcome where the model incorrectly predicts the negative class when it is the actual positive value. We combined the training and validation trained data after validating the model during the process and then evaluating the model's performance with 20% of testing data. CM measures the performance of the TL model, whether it is classified correctly or incorrectly.
We used the classification metrics to calculate the model's performance: accuracy, recall, precision, and F1 measures.
Accuracy is the measure of predictions that the model classified correctly.  A recall is used to identify all actual correct relevant classes retrieved from the dataset.
Recall ¼ #Correct Actual Positives Tot:# of Actual Positives F1 score is the harmonic mean of precision and recall. Its score becomes 1 when both precision and recall are perfect and becomes 0 when either precision or recall result becomes 0. F1 score measures the preciseness and robustness of the classification model.

TL Models Evaluation Using Measurements
There are N (N = 21) classes with an integer labeled 0 to N-1. The generated records were transformed into a confusion matrix (Figure 3). The three TL models generated the class label records for 21 classes ranging from 0 to 20 while testing each class with 20 samples. For instance, in the Inception_v3 model in Figure 3a, the first-class labeled with 0, among 20 testing samples, 18 classes are correctly classified, but the other two classes, i.e., the actual classes 3 and 18, are predicted as class 1. Based on the confusion matrix (Figure 3a) records, the performance of TL with the Inception_v3 model has been calculated and recorded in Table 4. Similarly, the performance of TL with the resnet50v2 and VGG19 models have been measured in Tables 5 and 6 based on the confusion matrix (Figure 3b and 3c), respectively.
The accuracies of the three TL models for training and validation data are shown in Figure 4 , and the overall accuracies are recorded in (Table 7). The accuracy of F1-Score is the preciseness and robustness of the classification model since it computes the harmonic mean of precision and recall. The categorical-cross-entropy loss function was used while compiling the model. For the correct class, the value of the loss function becomes closer to 0 ( Figure 5).

Discussions on Results, Methods, and TL Performances
The experiments resulted in section 3 prove that the pre-trained models are applicable in LCLU classification in RS images. The accuracy is also our aim, and we got better results in each model and most individual classes. We evaluated each class's accuracies using precision, recall, and F1-score measurements for each model. Precision is outperformed, i.e., 88%, 88%, and 90% for all the three models (Tables 4, 5 and 6). That means the relevant classes were retrieved and predicted correctly. If F1-score is perfect (1), i.e., 100% accurate for certain classes, precision, and recall are also perfect for all classes.
As seen in Tables 4 and 8 for the Inception_v3 model, the classes such as agricultural, airplane, chaparral, freeway, overpass, and parking were bestperformed precision. In contrast, medium residential was performed the poorest result, i.e., 64% precision result. Airplane, chaparral, harbor, parking lot, and runway classes were best-performed recall while golf course and dense residential were performed the worst results, i.e., 45% and 50% recall result, respectively. Moreover, the classes such as airplane, chaparral, and parking lot are best accurate in f1-score while dense golf course (56%) and residential (59%) are the poorest. For similar situations, we grouped classes in their best or worst/poorest value under each Resenet50v2 and VGG19 model measurement in Table 8.  The poorest precision results in the medium residential and dense residential classes and the poorest recall result in the golf course are observed in all three models. The results of the poorest accuracy could be the cause of the image variant similarity and resolution differences.
By applying the hyperparameters listed in Table 2, the TL model has been modeled using the Adam optimizer with TR of 0.0001 and compiled with the loss function of categorical-cross-entropy. The loss function predicts an integer value for each class N assigned from 0 to N-1, where N =21 classes in the UCM dataset. The cross-entropy loss has become lower and lower for the deeper network training process to identify the correct class. A correct crossentropy value is 0 for a correct class. The value of the cross-entropy loss function increases for misclassified classes, and the trained network fails to find the correct class (Bahri et al. 2020). In Figure 5, the training loss graph with blue color is closer to 0. So, the trained network is good for predicting the correct class in TL.
In addition to dropout, we used the early stopping technique to reduce overfitting and improve performance. The training was stopped early either the performance of the validation loss stopped decreasing even though the performance of the training loss decreased, or the performance of the validation accuracy stopped increasing although the performance of the training accuracy increased. We assigned the epoch value 100, and the early stopping stopped at epoch 19, 18, and 95 randomly when validation loss stopped decreasing for Resnet50v2, InceptionV3, and VGG19, respectively. Mostly we observed that the larger early stopping resulted, the larger the accuracy, for instance, VGG19 in this study.
Therefore, VGG19 has superior TL model performance. As seen from Table 7, the VGG19 model achieved the superior accuracy of 99.64% for a 70% training ratio among all methods.

Discussions on Similar Studies
In this paper, we utilize the Resnet50V2, Inception-V3, and VGG-19 as our baselines. We compared the classification performance accuracy of this paper with the UCM dataset's state-of-the-art classification studies (Table 9). According to Table 9, all the proposed TL models achieved superior accuracy among the state-of-the-arts. We used the Adaptive optimizer with the smallest LR value, i.e., 0.0001 while the others used stochastic gradient descent (SGD) with various parameters. Most of the researchers listed in Table 9 have used the epoch number 50, but we have used 100 epochs and early stopping while validating the model.  Therefore, the proposed VGG19 achieved the superior accuracy of 99.64% for a 70% training ratio among all methods we used. The Resnet50v2 model results in lower performance than the other two methods. The results in the three pre-trained models demonstrate that the TL model can prove its availability on RS images.

Conclusions
In this paper, we addressed the problem of LCLU classification in RS images using deep TL models with the bottleneck feature extraction. Our objective was to apply the TL model and improve the classification performance for LCLU classification in RS images. The training time of TL is more efficient (trained in seconds) than the other deep CNN models (trained in days when they trained from scratch), as observed in the state-of-the-arts study by (Mahdianpari et al. 2018;Rashid et al. 2020). We used the bottleneck feature extraction method from pre-trained models to improve the model's training speed and accuracy.
The model's performance is also prominent in all models, i.e., 92.46%, 94.36%, 99.64% accuracy results for Resnet50V2, Inception-V3, and VGG-19, respectively. However, the superior accuracy is profound in the VGG19 model with efficient time. Most of the classes' performance is prominent accuracies except for some classes such as the medium residential, dense residential classes, and golf course, which have the poorest accuracies when evaluated by precision, recall, and F1-score.
LCLU classification in RS image contributes significant values (Alshari and Gawali 2021;Lima and Marfurt Lima and Marfurt, 2020) to decision-making and planning in rural and urban areas. Our contribution is applying the deep TL using bottleneck feature extraction for the LCLU classification problem using RS image. This contribution directs environmental resource management and sustainable development for agricultural and urban planning. In addition to this contribution, we evaluated and improved the performance of the TL models and proved their availability on the LCLU classification in RS images.
For further investigation, we will use more recent deep neural networks with various pre-trained architectures as our baseline to substantiate the effectiveness of TL pre-trained models on various datasets and parameters.