Modeling dynamic land use changes in the eastern portion of the hexi corridor, China by cnn-gru hybrid model

ABSTRACT Deep learning models have been extensively employed to simulate dynamic Land Use Changes (LUC) at the regional level. However, existing studies were found to be limited by the inadequate extraction of Spatiotemporal Neighborhood Features (SNF) and ignored obvious Long-Term Dependence (LTD) in time series data, which low simulation accuracies. Consequently, it remains to be explored whether deep learning should be employed to initially extract spatial or temporal features in LUC simulations to achieve better results. We propose here a deep learning model of a Convolutional Neural Network – Gated Recurrent Unit (CNN-GRU), coupled with SNF learning to simulate LUC. Land use data for the Eastern Portion of the Hexi Corridor (EPHC) (2000–2020) were utilized to verify the effectiveness of our model, which was compared with four other models (GRU, CNN, CNN-LSTM, and GRU-CNN). We found that CNN-GRU the highest simulation accuracy (OA = 0.9346), whereas the kappa coefficient increased by 1.22%-4.57%. These results indicated that: (1) EPHC urbanization showed a trend of contraction, rapid expansion, and slow growth. (2) The CNN-GRU model effectively extracted SNF and LTD, which improved the accuracy of the simulation. (3) An enhanced effect could be achieved when spatial features were initially extracted during the SNF deep learning extraction process. (4) Variable neighborhood unit sizes influenced the simulation results, where a 3 × 3 window was most suitable for the study area.


Introduction
Land use change (LUC) represents one of the most direct manifestations of the interactions between anthropogenic activities and the natural environment (Fiener, Auerswald, and Van Oost 2011;Haase et al. 2012). A dramatic increase in LUC due to urbanization, deforestation, and desertification have altered the global landscape and increased pressure on ecosystems worldwide (Aburas, Ahamad, and Omar 2019;Yang et al. 2019).
Land use simulations are essentially time series modeling problems that integrate historical land use classification data sets and other driving factors (Qian et al. 2020). Since there are complex interactions between these factors, neighborhood effects and complex neighborhood characteristics must be extracted to define the conversion rules (Cao et al. 2015). Macroscale land use simulations consider cities as homogeneous entities for monitoring the ecological impacts of urban sprawl or LUC (Kuang 2012). However, decision-making at the regional level is quite different from that at the macro level, in that regional simulations not only reveal the processes involved in regional LUC, but also provide regional planners with fundamental data regarding potential future land allocation and urban renewal (Godoy and Soares-Filho 2008).
Temporally, LUC is a dynamic and continuous process, as the landscape unit of each moment is associated with the previous moment, which also impacts the next . As uncertainty plays a role in future predictions, the accurate extraction of Spatiotemporal Neighborhood Features (SNF) and their driving factors is of particular importance toward the establishment of land use simulation models (Feng and Tong 2018;Mustafa et al. 2018b). Liu et al. (2018) developed a global scale multi-temporal urban land map using the Normalized Urban Areas Composite Index, with an global scale overall accuracy for the mapping results of 0.81-0.84. Ma et al. (2016) employed nocturnal luminosity data to estimate the pace of urbanization and subsequent land cover changes in China, with the results revealing that accelerated urban expansion increasingly threatened cultivated land.
Currently, dynamic LUC models are primarily simulated using economy-based, cellular automata (CA), agent-based, and calibration strategies such as ant colony optimization Ren et al. 2019;Xu et al. 2015Xu et al. , 2020. With the continuous accumulation of copious high-precision monitoring data, traditional dynamic land use simulation methods face multiple challenges in terms of accuracy (Aburas, Ahamad, and Omar 2019). For example, economicsbased models ignore the spatial interactions between different land units (Azadi et al. 2017). CA models focus on the simulation of spatial patterns by explicitly considering the immediate neighbors of each landscape, where CA modeling structures can be improved by incorporating a variety of driving forces into the model (Mustafa et al. 2018a). Partially agent-based models require significant computational and empirical resources to understand rule-based LUC processes (Groeneveld et al. 2017). In contrast to traditional, or shallow machine learning (ML) models, various deep learning (DL) architectures transform the data presentation of one level to a representation at a higher level. This overcomes many of the restrictions placed on the modeling process by traditional or shallow ML models . In conjunction with algorithm improvements, the accuracy of dynamic LUC simulations has also gradually progressed (Bin and Kockelman 2008). Zhang et al. (2019) employed the Random Forest (RF)-CA model to simulate land use, with an overall accuracy (OA) value of 0.8915. Qian et al. (2020) used a CNN/CA combination to simulate land use, with an OA value that was increased to 0.9178. Nevertheless, the accuracy of these simulations will still need to be improved.
Deep learning has been extensively applied across disciplines, and effective modeling can be performed for big data, as well as environmental and soil research (Gheisari et al. 2017;Gholami et al. 2020Gholami et al. , 2021. Deep learning methods demonstrate the capacity to glean and concentrate spatial and temporal features from vast quantities of data; correctly and autonomously extract transformation rules; and accurately build model neighborhoods to facilitate the application of data prediction for complex structures (Alfonsina Chang-Martinez et al. 2015;Liu et al. 2021a). Further, deep learning models can extract spatial correlation features from multiple variables with developed algorithms including ANN, CNN, and RNN (Zou et al. 2017). He et al. (2018) took the neighborhood effect into account and used a CNN to realize driving factor spatial feature mining; however, they used data time slices only. These methods merely considered the extraction of the spatial dimensional characteristics of the driving factors and explored the transformation rules, while ignoring the temporal dimension characteristics and obvious Long-Term Dependence (LTD) in time series data (Liu et al. 2021b). Zhai et al. (2020) utilized a CNN-Vectorbased cellular automata (VCA) model to simulate land use changes at a fine scale. However, due to the limitation of CNN, which could not extract LTD, the attained figure of merit (FOM) was only 0.361.
Tepe and Guldmann (2020) considered potential spatial dependence, and developed a spatiotemporal multinomial autologistic model, which correctly estimated 91.4% of all observations. White, Uljee, and Engelen (2012) proposed a new method to define the neighborhood effect that considered the distance to the central unit. Liao et al. (2014) believed that there existed optimal neighborhood decay coefficients in accordance with the regional characteristics of an area, and that urban CA modeling should consider the role of neighborhood decay. Because of the decisive role of neighborhood size in the calculation of transition potential, it has an important influence on the results of urban simulations (Li and Yeh 2000;Pan et al. 2010). When extracting spatial data, it is necessary to select the appropriate neighborhood size to make the model simulation achieve a higher effect .
Deep learning models used for time series data prediction include the Recurrent Neural Network (RNN), Long Short Term Memory (LSTM) network, and Gated Recurrent Unit (GRU) network (Huang et al. 2021). Liu and Sun (2020) proposed a LSTM-CA collaborative computing method to establish a simulation model for the identification of volcanic ash cloud diffusion. They verified that the total identification accuracy of this method achieved a higher level, which could precisely simulate the horizontal and vertical diffusion trends of a volcanic ash cloud. Shahid, Zameer, and Muneeb (2020) demonstrated that GRU had a better simulation capacity than LSTM in time series data prediction. The GRU model structure can maximize the conditional probability of the target sequence given the source sequence, improve memory capacity and training performance, and better resolve the issues of overfitting, gradient vanishing, and explosion (Rehman et al. 2021). Du et al. (2020) extracted the SNF and LTD of multi-mode traffic data based on the CNN-GRU network, and the Root Mean Square Error reached 9.09. Lian et al. (2020) proposed a deep learning model based on a multidimensional feature selection layer, CNN layer, and GRU layer to predict the trajectories of tropical cyclones, with the results confirming that the model was superior to traditional weather forecasting models. Wu et al. (2020) achieved high precision load prediction in a power system using GRU-CNN. Consequently, it remains to be explored whether deep learning should be used to initially extract spatial or temporal features in LUC simulations to achieve better results. The studies above indicated that the CNN-GRU model could provide a potentially novel approach for land use simulation research.
For this study, we combined spatial and temporal neighborhood feature learning in a CNN-GRU deep learning model to supplement the inadequate extraction of the SNF and LTD in existing LUC simulation studies. CNN was employed to extract high-level spatial dimension features of the driving factors in the neighborhood, whereas GRU was used to adaptively learn the temporal dimension features and LTD. Concurrently, the effects of different window sizes and extraction sequences on the simulation results were also investigated. The Eastern Portion of the Hexi Corridor (EPHC) is located in an arid, ecologically fragile region of Northwest China, where the growth rate of the regional economy, population, and urbanization is quite different from that of coastal cities (Chen et al. 2021;Tan et al. 2020). This study aimed to: (1) clarify the spatiotemporal evolution of the LUC in the EPHC from 2000 to 2020; (2) train the corresponding conversion rules of deep learning for the EPHC; (3) extract SNF and LTD, improve the simulation accuracy, and assess performance; (4) use the model to simulate the LUC in 2030.
In the EPHC, the distribution of cities and towns is scattered, the functions are single, the urban and rural areas are clearly distinguished, and the distribution of polarization exists. The urban land showed a changing v-shaped trend that initially decreased and then increased (Qin-ke et al. 2021), and the EPHC region has a growing population. Meanwhile, urbanization has accelerated population flows into cities (Chun-yan and Xiao-feng 2020). By the end of 2020, the resident populations of Zhangye City and Wuwei City were 1.238 million and 1.825 million respectively. The fragile ecosystem of this area in conjunction with the complex natural geographical topographies and severe economic development pressures, have brought great challenges to the ecology of the EPHC.

Data sources
For this study, the land use types were divided into nine categories: cultivated land, forest land, grassland, shrubland, wetland, water bodies, artificial surfaces, bare land, and permanent snow and ice, based on the LUC dataset (2000,2010,2020) provided by the Global Geographic Information Product Platform (http://www.globallandcover.com/). The driving factors of traffic accessibility, socioeconomic conditions, terrain, and climatic conditions were adopted to describe the LUC processes. Considering the interactions between the LUC system and human activities, traffic routes were divided into five types (main road, railway, river, residential area, and ecological function protection area), and Euclidean distance analysis was conducted to characterize accessibility. The further the distance, the closer the value was to 1. Social economy was primarily represented by the four factors of nocturnal illumination, GDP, population, and NPPs. The locations with prosperous economies and large populations were more likely to become built-up areas. DEM, slopes, and faults, etc. influenced whether an area was suitable for the expansion of built-up areas in representing topographic conditions (Gounaridis et al. 2019). Precipitation and temperature represented climatic conditions. All data were normalized to [0,1] and resampled to 30 m to reduce the calculation cost and expedite the simulation process, with the details shown in Fig

Methods
We collected land use data from 2000 to 2010, as well as 13 driving factors to estimate the regression coefficient and correct the model parameters. The proposed model was coded in Python and based on the TensorFlow framework, after which the simulated land use map in 2020 was obtained. An approach for simulating LUC through the integration of CNN and GRU models is shown in Figure 2, and the data were segmented into two parts according to administrative divisions.
To verify the effectiveness of our proposed model, three groups of models were established to compare their predictive performance. (1) Different types of RNN (CNN, CNN-LSTM, and CNN-GRU) were employed as encoders and decoders to evaluate the performance of time-dimensional feature extraction. (2) Different types of traditional statistical models (GRU, CNN-GRU) were used to verify the performance of spatial dimension feature extraction. (3) Comparison of CNN-GRU and GRU-CNN. Due to the large quantity of data, the model training and validation sets were divided at a 30:1 ratio. The training set updated the network parameters through the gradient descent, while the test set evaluated the model performance only, and did not participate in parameter updating.

CNN
CNN was employed to extract the historical land use data and spatial dimension characteristics of the driving factors (Part I of Figure 2). A two-dimensional convolution operation was utilized to capture the local spatial features of images (He et al. 2018). A regular sampling window was utilized to construct a dataset of driving factors near each block unit. Firstly, the data of each driving factor was organized into a raster layer, where the row and column positions in the raster data that corresponded to the centroid of each polygon were extracted. Secondly, a driving factors dataset was constructed using an N × N sampling window around each central pixel on the driving factors raster layer, such that each neighborhood had the N × N characteristics of each driving factor. The sizes of the sampling windows were determined by the influence distance and spatial resolutions of the driving factor data. When M driving factors were employed, the number of driving factor features corresponding to each package was N × N × M following sampling. Lastly, the developed driving factor data set, and the LUC data were introduced into the CNN model for training. Following repeated training and feedback cycles, an optimal CNN model was selected to extract the complex neighborhood characteristics of the driving factors.
where, l is the current layer, s l ðc;dÞ is the output of position (c, d), m is the feature number of the (L-1) layer, e l is the offset, σ is the activation function, b l;m i;j is the value of the core ði; jÞ and a and b are the height and width of the core, respectively.

GRU
GRU was used to extract the temporal dimension features and the LTD (Part II of Figure 2). On the basis of RNN, many excellent evolution models were generated, such as LSTM and GRU (Elsayed, Maida, and Bayoumi 2019). These models resolved the issue of LTD through the addition of memory units and avoided gradient explosions by gating units (Chung et al. 2014). In contrast to LSTM, GRU combined the forget and input gates into a single update gate, and also mixed the cell and hidden states, such that there were fewer parameters and the training was faster . Therefore, the GRU algorithm was selected as the time series feature extraction module for this study. The activation of the jth hidden unit was calculated as follows.
(2) where, r j represents the reset gate; z j refers to the update gate; h j is the actual activation of the unit; σ is a sigmoid logical function; [.] j represents the jth element of a vector; x and h (t-1) are the input and hidden states of the previous state, respectively; W r and U r are weight matrices; and f is the tanh function. The next hidden state contained valid information from the previous time step and was determined by the reset gate. The reset gate determined how to combine the new input data with the previous memory, after which the previous time step hidden state and the current input were passed to the update gate, which was mapped by the tanh activation function to determine which data in the memory cell would be transferred to the hidden state.
The data were combined into three-dimensional tensors [X,Y,Z] in chronological order. The X dimension represented the number of data, that is, the total number of all grids simulated. The y-dimension was the data in chronological order, where the spatial variables of the driving factors were assumed to be static during the simulation process. Therefore, each of the values on the Y dimension were driving factors and the land use data of the training set over the years. The Z dimension represented the GRU output dimension, and the threedimensional tensor was fed into GRU model for training. Following repeated training and feedback cycles, an optimal GRU model was selected to extract the temporal dimension features and LTD of the LU under the influence of the driving factors.

CNN-GRU
The CNN-GRU model was divided into two parts (training and prediction). The land use data and driving factors were normalized for preprocessing and then iterated. For the prediction phase, the set parameters could be applied to a given input and the output was obtained. Based on the characteristics of land use time series simulation research, we constructed a six-layer deep neural network structure: the first two layers were CNN layers, the middle two were GRU layers, and the last two were fully connected layers. Figure 1. The settings of main parameters are shown in Table 1.
When convolution was conducted at the first layer of the network, a circle of 0 was filled around the window, the original image size would not be altered following the completion of the convolution. For the second convolution layer, 0 no longer filled at the edge to diminish the redundancy of eigenvalues, and a (N-2)×(N-2)×60 eigenmap was finally obtained. The data format was transformed, and the number of input features for each time step was 4. For the output sequence, the hidden state values of all time steps of this layer were returned, the output sequence of the first GRU layer was input to the second GRU layer in the form of many-to-one. The second layer GRU returned the output of the final time step.
The learning rate was set to decrease from large to small with the training of the model, respectively {0.01, 0.008, 0.006, 0.004, 0.002, 0.001}. At the beginning, learning was accelerated at a significant rate and rapid convergence was achieved. At the later stage, optimization was performed at a lesser learning rate. We introduced the cross-entropy loss function (Eqn. 3) into the neural network to select the best performance model.
where, N represents the number of samples; K is the number of categories; y k i is the sign function (0 or 1), if the true category of sample i is equal to k equals 1, otherwise 0; ŷ k i is the predicted probability of observation sample i belonging to category k.
The setting of hyperparameters in the CNN-GRU model of this study was based on the results of repeated experiments, and the optimal set of hyperparameters was finally selected, which was applicable to other comparative models.

GRU-CNN
The GRU-CNN model was comprised of six layers of a deep neural network structure. The first two were GRU layers, the middle two were CNN layers, and the last two were fully connected layers. By converting the data format, 64 convolution kernels were initially used for feature extraction, and 128 convolution kernels were selected in the next CNN layer. Other settings of the main parameters were the same as for the CNN model

Evaluation metrics of simulation accuracy
The following indicators were used to evaluate the model (OA, kappa, FOM, Precision, and Recall). For the evaluation of accuracy of land use simulations, OA and kappa coefficient are commonly used, as shown in Eqns. (4)-(7) (Congalton, Oderwald, and Mead 1983). Kappa where, k is the number of land use categories; n ii is the number of units that are simulated in category i and are in category i of the actual LU map; n ij is the number of units that are simulated in category i but are actually in category j; n ji is the number of units that are simulated in category j but are actually in category i. The FOM index can comprehensively measure the modeling accuracy of simulated changes, as it focuses on the accuracy of the change area, rather than that of the entire study area. It is also extensively used for accuracy verification. The range of FOM is 0 ~ 1, where the higher the FOM, the higher the simulation accuracy (Pontius et al. 2008).
where, A is the number of units that changed in the simulated map but remained the same in reality; B is the number of cells that correctly changed in both the simulation map and the actual map; C is the number of units that were incorrectly changed in the simulated and real maps; D is the number of units that remained the same in the simulated map but changed in the actual map. Precision represents the ratio between the number of relevant samples in the identification or retrieval results and the total number of samples in the results. Recall refers to the number of correctly predicted data with true positive examples, which measures the capacity of the model to identify positive examples where, TP is the number of positive classes to be predicted, FP is the number of negative classes to be predicted, and FN is the number of positive classes to be predicted.

Quantitative analysis
The results of the CNN-GRU, GRU, CNN, CNN-LSTM, and GRU-CNN simulation models for land use types in 2020 were compared using OA, Kappa coefficient, FOM, Precision, and Recall accuracy evaluation criteria. To ensure the validity of the comparative results, identical parameter settings were employed for the four models in the experiment. The results of the five precision evaluations are shown in Table 2.
(1) The performance of the CNN-GRU model was superior to that of the traditional GRU model for the five selected indicators, and the kappa coefficient increased by 1.22%. This indicated that spatial features were one of the keys of LUC, and the extraction of SNF could enhance the simulation performance.
(2) Compared with CNN, the kappa coefficient and FOM of the CNN-GRU model increased by 0.83% and 0.51%, respectively. The performance of the CNN-LSTM model for the five selected indicators was lower than that of the CNN-GRU models; however, it was significantly better than that of the CNN model. This indicated that timing was also one of the key factors of LUC, LTD was a necessary factor in LUC research, and the simulation performance could be improved by hiding memory units.
( 3) The CNN-GRU model had higher values for the five evaluation indices than did GRU-CNN, which indicated that CNN-GRU had an improved effect over GRU-CNN for land use simulations, as the kappa value increased by 4.57%. When GRU was extracting temporal features, the reset gate may have forgot neighborhood features that were useless relative to temporal features, which resulted in partial neighborhood features that could not be transferred to the subsequent convolution layer; thus, performance degradation.
(4) The evaluation indices of the CNN-GRU and CNN-LSTM models were both higher than those of other models, which indicated that SNF and LTD played critical roles in the enhancement of simulation accuracy. Since GRU optimized the update threshold based on LSTM, the performance of CNN-GRU was improved compared with CNN-LSTM.
(5) Considering the SNF and LTD the CNN-GRU model was superior to the other models, which verified the effectiveness and superiority of the method.
The per-class precision and recall values are shown in Tables 3, 4. Through comparison, CNN-GRU maintained the highest precision and recall rate in cultivated land, forest land, grassland, shrubland wetland, artificial surfaces, and bare land. The precision of CNN-GRU was lowest for water bodies and permanent snow and ice, but the recall rate was higher than the other models. These results demonstrated the best simulation accuracy of CNN-GRU.

Qualitative analysis
From the qualitative perspective, the simulation results of these models and the LU diagram of the research area in 2020 are shown in Figure 3. In general, the performance of these simulation models was acceptable, which was consistent with the actual map. However, there were some differences in detail between the models, and three regions were randomly selected for a detailed comparison.
The GRU-CNN model had the worst simulation effect, whereas the CNN-GRU model was the best over all other models. Inconspicuous features such as rivers were overlooked in the GRU simulations compared with the actual images. The results of the CNN model were more fragmented than those of the CNN-GRU model. In contrast to the actual land use map, the CNN model was also unable to  simulate inconspicuous features such as rivers, and the simulated built-up areas had a poor effect. The effect of the CNN-LSTM model was lower than the CNN-GRU and the edge simulation was rougher, which meant that GRU was more accurate than LSTM in a temporal neural network. The results of the CNN-GRU model were best at simulating inconspicuous features, the most complete for the simulation of shrubland, and best in the reduction of details. Consequently, the CNN-GRU model was most consistent with the land use in the 2020 study area in terms of the spatial distribution of simulation results. This further verified the advantages of considering both SNF and LTD.

Sensitivity analysis of CNN-GRU parameters
The sizes of neighborhood units directly impacted the neighborhood driving factor data of each block unit coupling, and had an uncertain influence on the accuracy of the final simulation. It was necessary to explore the relationships between different neighborhood unit sizes and the accuracy of the corresponding simulation results. To analyze the influences of input parameters on the final simulation results during the training process of the CNN module, we conducted a series of batch-size experiments according to different neighborhood unit sizes (1 × 1, 3 × 3, and 5 × 5).
CNN-GRU models were established with different neighborhood unit sizes, and the LOSS and accuracy curves of the training and validation sets, when the deep learning network was running Figure 4. The prediction performance of the CNN-GRU model with different neighborhood unit sizes is shown in Table 5.
Following the comparison, it was revealed that when the neighborhood unit was 1 × 1, the LOSS was higher than the other models, and the accuracy was lowest. When the neighborhood unit was 3 × 3, the accuracy achieved was 0.9376 and the LOSS dropped to 0.1786. When the neighborhood unit was 5 × 5, the accuracy began to decline, the LOSS increased, the generalization ability began to decline, and the model training time significantly increased. The training time for each epoch increased by 1 h compared with that for the 3 × 3. After assessing the experimental results, the 3 × 3 neighborhood unit was determined as the optimal hyperparameter model for balancing the prediction accuracy with training time. Therefore, the 3 × 3 neighborhood unit was selected for land use prediction.

LUC in the EPHC from 2000 to 2020
From 2000 to 2020, the transfer of land use types in the EPHC is shown in Table 6. A total of 56.02% of the cultivated land transferred from EPHC was converted to artificial surfaces, 32.19% was converted to grassland, and 8.57% was converted to bare land. The area of cultivated land increased slightly due to the supplement of some grassland and forest land conversion to cultivated land, where 94.55% of the transferred portion of the artificial surfaces became cultivated land.
Forest land was primarily converted to grassland, which accounted for 96.08% of the area of converted forest land, whereas the remaining portion was primarily converted to shrubland and bare land. Grassland was converted into forest land, bare land, and shrubland, accounting for 35.45%, 26.08%, and 20.11% of the transferred portions, respectively. It was shown that the artificial surfaces in the EPHC were continually expanding, with most being converted from cultivated land, and a small portion were converted from grassland and bare land. The conversion of forest land to artificial surfaces was relatively small, as most of the forest land in the study area was at high altitudes, far from artificial surfaces, and part of the national nature reserve.

Simulation of LUC from 2000 to 2030
The 2010-2020 land use data and 13 driving factors were introduced into the CNN-GRU model to simulate land use in the EPHC. The predicted results of land use in the EPHC for 2030 were obtained as shown in Figure 5.
The total land area of the EPHC was 70.47 × 10 3 km 2 , of which the bare land area was the most extensive, followed by grassland and cultivated land (Table  B of Appendix B). From 2000 to 2020, the artificial surface area expanded 1.89 times, at an annual growth rate of 5.44%; however, it exhibited various trends during different periods. From 2000 to 2010, the study area was in the urban contraction phase and the urban area did not increase but showed a slowly decreasing trend Figure 6. Over the following 10 years, the urbanization area increased rapidly and expanded significantly.
From 2000 to 2020, the grassland and wetland areas continued to decline. The cultivated and bare land showed a slowly increasing trend, and then a rapid decrease. Initially, the forest land area and water bodies slowly decreased, and then increased rapidly, whereas the shrubland and permanent snow and ice increased, Among them, the shrubland accelerated its expansion at a growth rate of 14.7%, showing a multiple expansion trend.
From 2020 to 2030, the water bodies had the fastest annual growth rate of 1.1%, whereas the shrubland ceased to expand and diminished the fastest. The artificial surface and forest land continue to expand; however, the growth trend slowed. The cultivated and bare land were no longer shrinking and revealed a growth trend. The grassland and wetland were affected to varying degrees and the permanent snow and ice continued to expand. The changes to other land types tended to be stable.

Advantages of the CNN-GRU model
LUC is a prolonged process and the lack of LTD will limit the accuracy of the results. Deep learning can identify the intimate relationships between land use and the driving factors during the LUC process. Maleki et al. (2020) used a CA-Markov chain to predict future land use, and the simulated OA value was 87.7%-90.15%. Armin, Majidian, and Kheybari (2020) used the TerrSet Land Change Modeler to conduct land use development scenario modeling, and the OA value of the model was 81.9%-83.1%. Liu et al. (2021b) applied the LSTM-CA model to simulate land use with a focus on the extraction of temporal characteristics, while ignoring the importance of spatial characteristics, and the OA reached 0.9101. Through experimentation the CNN-GRU methods attained over 93.89% accuracy. In contrast to the traditional CA algorithm, the CNN algorithm could better uncover the deep connections between spatial     Liu et al. (2021a) used the DL-CA model to simulate LU with an OA of 0.9283. However, the LTD of land use evolution could not be extracted by extending the dimension of the convolution kernel to three dimensions based on the extraction of spatial neighborhood features by two-dimensional convolution. As shown in Table 2, the simulation results of CNN-GRU were more accurate than those of CNN, reaching 0.9389. The CNN-GRU model employed the gated time unit of GRU to effectively apply LTD to the deep learning model, such that the model achieved an improved simulation performance.

Discussion of prediction results in the study area
By comparing the 2020 and 2030 land use maps predicted by the model, it was shown that the grassland, forest land, and wetland were reduced across a small range. The artificial surface expansion was decreased, the water bodies, permanent snow and ice areas were replenished, and the occupation of cultivated land was also effectively controlled.
These results aligned with the guidelines issued by the State Council of China in January 2017, which aimed to strengthen farmland protection and improve the balance between Land Occupation and Compensation policy. The EPHC is an arid area with significant desertification, which is impacted by factors such as the lack of ecological and water resources. In the northern sandy area, 953 km 2 of land is subjected to wind and sand erosion yearround, and large swaths of cultivated land are abandoned annually due to water shortages and other factors. The overall quality and efficacy of cultivated land is low, and the situation for the protection of cultivated land remains quite serious.
Based on the prediction results, the following suggestions were proposed for healthy urban development and territorial spatial planning in the study area. Firstly, the random growth of urban land should be brought under control, intensive localized use should be promoted, and environmentally compatible land use patterns should be sought. The development potential of construction land should be optimized to alleviate the outward expansion of urban land, while enhancing the comprehensive intensity of development and utilization of artificial surfaces. Secondly, there should be a strong focus on ecological restoration and environmental protection, the level of urban space management should be significantly improved, as well as the ecological environment. Industrial infrastructures should be adjusted, while improved technologies should be implemented to reduce pollution emissions per unit of output value, toward the enhancement of manufacturing efficiencies and achievement of green production. Further, spatial layouts should be optimized to regionally coordinate new development and urbanization, to achieve economic growth. Thirdly, an ecological compensation mechanism should be established, where the costs of ecological protection and development should be balanced according to the value of ecosystem services. The costs involved for opportunities that reflect the interests of all parties involved in ecological and environmental protection and developmental construction should be mediated and modified through a combination of prudent administrative and market strategies.

Limitations and prospective future work
Although the CNN-GRU model proposed in this study achieved good results in LUC simulations, several deficiencies remained due to limitations in data collection and research methods. First, new policies issued by local governments were not included in the classification of impact factors. Liang et al. (2021) predicted future land use structures under different scenarios, with the results showing that local policies may impact future LUC. Second, this study adopted data with a resolution of only 30 m and did not explore the influence of data resolution on the size of the most suitable neighborhood. Thus, further investigations are required to assess whether data at different resolutions have a significant impact on the optimization of neighborhood size.

Conclusions
The aim of this study was to employ a new land use prediction model to explore the LUC in the EPHC. GRU and CNN models were combined to extract SNF and LTD, 13 driving factors were integrated, and the LU data of the EPHC from 2000 to 2020 was used for simulations. The results revealed that the simulation accuracy of this model was superior to that of other deep learning models (GRU, CNN, CNN-LSTM, and GRU-CNN), which confirmed the importance and effectiveness of SNF and LTD for land use simulations. This study established a simulation model for ecologically fragile areas based on spatiotemporal characteristics and LTD, which improved the scientific nature of the prediction results and provided a reference for their enhanced protection.
The innovations of this study were as follows: (1) A land use simulation model with higher accuracy than traditional deep learning methods was proposed to extract SNF and LTD. (2) Spatial feature and temporal sequence extraction in deep learning influenced the simulation accuracy, where the initial extraction of spatial features improved the simulation accuracy of the model. (3) The feature extraction results with various neighborhood unit sizes were significantly different. It was experimentally verified that 3 × 3 neighborhood units were the most appropriate for the study area model. (4) From 2000 to 2020, the construction land in the study area slowly decreased and then expanded rapidly. According to the predictive model, urban expansion will decrease from 2020 to 2030, which provides a certain theoretical basis and reference value for the realization of new urbanization and ecological revitalization, against the background of territorial spatial planning and healthy urban development.  Precipitation, (h). Population, (i). Distance to settlement, (j). Distance to road, (k). Slope, (l). Temperature, (m). Distance to river.