A review of land use/land cover change mapping in the China-Central Asia-West Asia economic corridor countries

ABSTRACT Large-scale projects, such as the construction of railways and highways, usually cause an extensive Land Use Land Cover Change (LULCC). The China-Central Asia-West Asia Economic Corridor (CCAWAEC), one key large-scale project of the Belt and Road Initiative (BRI), covers a region that is home to more than 1.6 billion people. Although numerous studies have been conducted on strategies and the economic potential of the Economic Corridor, reviewing LULCC mapping studies in this area has not been studied. This study provides a comprehensive review of the recent research progress and discusses the challenges in LULCC monitoring and driving factors identifying in the study area. The review will be helpful for the decision-making of sustainable development and construction in the Economic Corridor. To this end, 350 peer-reviewed journal and conference papers, as well as book chapters were analyzed based on 17 attributes, such as main driving factors of LULCC, data collection methods, classification algorithms, and accuracy assessment methods. It was observed that: (1) rapid urbanization, industrialization, population growth, and climate change have been recognized as major causes of LULCC in the study area; (2) LULCC has, directly and indirectly, caused several environmental issues, such as biodiversity loss, air pollution, water pollution, desertification, and land degradation; (3) there is a lack of well-annotated national land use data in the region; (4) there is a lack of reliable training and reference datasets to accurately study the long-term LULCC in most parts of the study area; and (5) several technical issues still require more attention from the scientific community. Finally, several recommendations were proposed to address the identified issues.


Introduction
"Land" is defined as a place on the Earth's surface where human activities are conducted. Physical land classes, such as open water, forest, and cropland can be described as Land Cover (LC) (Li et al., 2017a), whereas Land Use (LU) indicates how human beings utilize the land (Foley et al., 2005). The changes in LU and LC over a period of time are collectively known as Land Use Land Cover Change (LULCC).
Economic Corridor to overcome possible challenges, and to provide comprehensive information for decision-makers and scientists. In summary, this study has three main objectives: • To review the main causes and environmental issues of LULCC; • To review the methods of LULCC mapping in this area, including data collection, classification algorithms, and accuracy assessment. • And to present remaining challenges of LULCC studies.

Study area
The Economic Corridor has the most coverage of the ancient Silk Road, which was a network of trade routes between the East and West. It consists of eight countries from different geographical zones including China, Kazakhstan, Kyrgyzstan, Turkmenistan, Tajikistan, Uzbekistan, Iran, and Turkey (Fulton, 2016). Taken together, there are more than 1.6 billion inhabitants in the area of more than 16 million square kilometers. The study area plays a major role in the world economy and the fossil fuels trading market and is home to the world's largest fossil fuel reserves. As shown in Figure 1, the region across different climate zones. Based on the Köppen-Geiger climate classification (Beck et al., 2018), this region can be divided into four dominant climates, including (1) cold desert climate, (2) hot semi-arid climate, (3) tundra climate, and (4) monsoon-influenced hotsummer humid continental climate.  (Beck et al., 2018)).

Data collection method
The literature review of LULCC studies conducted in the Economic Corridor was carried out using Core Collection of Web of Science group (Clarivate Analytics) because of its offer of comprehensive resources from different journals. Two main categories of keywords, including "land use change" and "land cover change", were used along with all eight countries' name (one by one) and "Central Asia", in the process of searching for finding the literature reviews in the title, abstract, and keywords. By performing a search related to the information above and using the date of June 30, 2020, nearly 3,000 publications were found. Following a quick evaluation of the titles or abstracts from the publications, 350 publications were manually selected for further analysis. The following rules were used to select these publications: • Papers focusing on developing change detection models, which were broadly discussed in the past review studies, without providing any primary information (e.g., main driving factors and the main causes of LULCC). As analyzing and studying these attributes were the main objectives of the present review, these papers were eliminated. • Papers extracting individual classes, such as forest change and wetland change, and conducted to very small areas and utilized traditional models were discarded. • Papers which although had "land use" and "land cover" in their titles but did not provide primary information that could help to achieve our objectives were removed. • Papers which were conducted in the same case study area or a research area was used in several studies applying similar sensors and methods with almost the same scientific contribution and conclusion were also removed.
In the last step, for every single publication in the final list, 17 attributes (see Table 1) were considered to carry out detailed analysis and to determine the development and challenges of LULCC in the study area.

Research developments and challenges in sensors
Due to high temporal frequency, computing efficiency, regular coverage of the Earth, and the availability of images with different characteristics (e.g., temporal, spatial, spectral, and radiometric resolutions), remotely sensed imageries are considered as the core source data for LULCC studies (Kharazmi et al., 2018;Naboureh et al., 2017;Zadbagher, Becek, & Berberoglu, 2018). It was observed that although researchers have used different types of satellite images based on the size of the study area, research goals, objectives, and questions, the spatial resolution of images has undoubtedly played the leading role in employing images. However, as shown in Figure 2, a large number of papers have employed Landsat in their studies (more than 60%), mainly because of its ability to provide access to almost 40 years of data records (Aslami & Ghorbani, 2018;Li et al., 2018;Muyibul, Xia, Muhtar, Shi, & Zhang, 2018). Our findings also showed that combining images with different types of spatial/temporal resolution (very high to high or low) from the same or different sensors, integrating active and optical remote sensing data, and fusing multispectral images with hyperspectral or Light Detection and Ranging (LiDAR) data, were frequently applied to improve the accuracy of LU/LC maps. For example, multitemporal and multi-sensor data (Sentinel-2 and Sentinel-1) were used for generating coastal LC map in the yellow river delta by Feng et al. (2019). Moreover, Kesgin and Nurlu (2009) studied LC Change in the coastal landscape of Candarli Bay, Turkey by combining time series of Landsat TM and ASTER satellite images. A remarkable growth in the number of RS datasets along with the desire and necessity of studying on large areas (i.e. global and continental levels) has brought the concept of RS Big Data in the Earth observation field in recent years Guo, 2017). Based on the conducted review, Google Earth Engine (GEE), which provides basic calculation functions for both raster and vector data, can effectively handle RS Big Data challenges in LULCC mapping (Eskandari, Reza Jaafari, Oliva, Ghorbanzadeh, & Blaschke, 2020;Huang et al., 2017;Liu et al., 2018;Teluguntla et al., 2018). It is expected that due to the increasing complexity of data, such as the diversity and higher dimensional characteristic, the role of RS Big Data and GEE in LULCC monitoring will increase. The following points are some issues based on our review that should be considered in sensor selection and dealing with RS Big Data: (1) Since accurate LULCC studies are important for authorities, the extraction of small classes (e.g. roads, built-up areas) needs very high spatial detail levels. However, acquiring very high-resolution images for large areas and long-term studies is difficult, and in most cases, the cost is not affordable.
(2) A great share of the study area (e.g., the south part of China, the north part of Iran, and Turkey) usually has the cloud cover mask problem. Although it can be addressed by the combination of several partially cloudy images, it leads to a remarkable increase of the satellite data number that must be acquired, stored, and processed (Devaux, Crestey, Leroux, & Tisseyre, 2019;Stillinger, Roberts, Collar, & Dozier, 2019). (3) As there are many mountainous areas in the study area, it should be considered that even though the integration of optical and radar data can boost the separation among classes in low altitudes (e.g. costal landscape), it can result in a decline of the classification accuracy with increasing elevation (Stendardi et al., 2019). (4) The existing techniques and systems cannot completely address RS Big Data issues completely for LULCC modeling in the study area. The need for improved algorithms and applications is urgent and necessary. (5) Although RS Big Data has got special attention from the Economic Corridor's authorities, there is no special platform that can provide reliable LU/LC data for the region. (6) GEE has several limitations. For example, due to computational restrictions, deep learning algorithms are not performed in GEE. Therefore, users need to implement deep learning algorithms outside of the GEE platform. (7) To model LULCC for the study area, RS Big Data can bring significant computational challenges, for example, the need for scalable data storage, dynamic workflow management, and flexible computing resource provisioning.

Research developments and challenges in data collection
The wealth of data sources can hugely aid to accurately map LULCC. It is observed that various types of data, such as field survey, LU/LC products, socioeconomic, environmental, and topographical data were applied for LULCC mapping and prediction in the study area. LU and LC products at a regional, national, or global level were used to initialize, calibrate, and validate LULCC models. Table 2 shows the most frequently used LU and LC products in the Economic Corridor based on the conducted literature review. Land prices, family size, and demography data were commonly used socioeconomic data. The analysis also illustrated that the availability of environmental and topographic data was better than socioeconomic data mainly because they can be acquired from interpolation of point data (e.g., climate data) or RS data (e.g. topography). Soil data, plant growth, agricultural yield, forest re-growth, elevation, and slope were frequently used environmental and topography data based on the literature review.
Advanced classification methods need reliable training data to produce accurate LULCC maps (Yang, Fu, Smith, & Yu, 2017). However, obtaining high-quality and sufficient reference datasets is still a critical issue in LULCC mapping (Zhang & Zhang, 2007). Based on the conducted review, acquiring reference data through field survey, visual interpretation of high-resolution image, freely available sample data set from Global Land Cover (GLC), and a combination of three methods were common methods that have broadly applied by the scientific community ( Figure 3).
Given the fact that the annual percentage of LC and LU can barely exceed a few percent of the total land area (Chen et al., 2019), migration of reference data from a specific time (year) to another time (target year) has been recently implemented to address the lack of accurate and reliable data (Ghorbanian et al., 2020). The crowdsourcing technology has been also proposed to help to sample collection in recent years .
Generally, there are multiple issues related to the reference data collection section as follows: (1) Generating reference data through visual interpolation was the most common method, but it could contain errors and time-consuming (Elmes et al., 2019).
(2) Obtaining reference data via the field survey is the most accurate method, but it cannot be conducted in all areas because of the inaccessibility issues. Mountainous  (Arino et al., 2008) CORINE 1990,2000,2006,2012 100 m 44 >85% (Jaffrain, Sannier, Pennec, & Dufourmont, 2017) NLCD 2001,2004,2006,2008, 2013 30 m 20 71%-97% (Yang et al., 2018) CLUD 1990, 2000 regions of the Economic Corridor (e.g. Tien Shan (China), Pamir (Tajikistan), Alborz (Iran), and Ararat (Turkey)) are examples in this regard. Moreover, field survey is a labor-intensive and time-consuming task; and equipment, travel, and training of field crew all contribute to the high cost of this method. (3) Although Asia is covered through the different GLC products (Chen, Cao, Peng, & Ren, 2017;Gong et al., 2016;Grekousis, Mountrakis, & Kavouras, 2015;Jokar Arsanjani, Tayyebi, & Vaz, 2016; Sulla-Menashe, Gray, Abercrombie, & Friedl, 2019), there is not a specific LC product for this region. (4) Although there are several GLC products, which cover the whole Economic Corridor, they cannot be considered as the main source for reference data collection due to their low accuracies and coarse resolutions. (5) Crowdsourcing technology is one of the latest techniques in the data sampling field, but there is still a lack of appropriate assessments for this technology.

Research developments and challenges in classification section
Recognizing spectral similarities and variances in a multidimensional spectral space, followed by connecting them to LC types, is identified as a process for remote sensing classification. Scientists have identified various methods for image classification because of local geographical features and different landscapes in the study area. As shown in Figure 4, the literature review has encountered different approaches for LULCC mapping.  approach to LULCC modeling (Feng, Huang, & Ren, 2018;Hu, Zhang, Zhang, & Yan, 2018;Kussul, Lavreniuk, Skakun, & Shelestov, 2017). After 2010, based on the literature, in nearly 50% of studies, different ML algorithms were used for LULCC classification. Despite all attempts have been done in the image classification section, there remain some issues that need more attention from scholars: (1) Given the diverse range of climate systems and terrain conditions along the case study area, there is no conclusive agreement on introducing a specific approach that would provide the most accurate results in all circumstances (Ma et al., 2017;Thanh Noi & Kappas, 2018). (2) ML algorithms have become increasingly popular in the classification section (Petliak, Cerovski-Darriau, Zaliva, & Stock, 2019;Tong, Xia, Lu, Shen, & Zhang, 2018). These methods need abundant reference data because classes are trained, tested, and ultimately classified based on training data (Belgiu & Drăguţ, 2016;Mountrakis, Im, & Ogole, 2011). However, acquiring reliable data is still a crucial concern for LULCC modeling in some parts of the study area (e.g. countries from central Asia namely Kazakhstan, Kyrgyzstan, Turkmenistan, Tajikistan, and Uzbekistan).
(3) Another issue that requires more attention from the scientific community for mapping LU/LC in the study area is addressing the class imbalance problem. In most cases, the distributions of LU/LC data are imbalanced with some majority of LC/LU types dominating against some minority of classes (Naboureh, Ainong, Jinhu, Guangbian, & Meisam, 2020). In most supervised classification methods, if one of the classes has fewer samples than the others, there is a class imbalance problem that influences the classification accuracy (Rodriguez-Torres, Carrasco-Ochoa, & Martínez-Trinidad, 2019; Zhang & Chen, 2019). Even though this issue affects LU/LC classification accuracy in minority of classes, it has been rarely considered in conducted studies of the Economic Corridor to date. (4) Although object-based methods were introduced as powerful approaches for LULCC mapping, applying these methods in the study area for long-term LUCCC studies has some limitations as follows. It is not practical to acquire fine resolution data in all parts of the study area (for a long-term period), while these methods need very high spatial resolution data. Additionally, OBIA involves many steps, such as choosing training samples and selecting the optimal scale; if one of those steps is not properly done, it can influence the classification accuracy (Ma et al., 2017;Ye et al., 2018). For example, finding the best segmentation scale in OBIA is challenging and inappropriate scales can cause under or over-segmentation problems.

Research developments and challenges in accuracy assessment
To evaluate the reliability of the results, the classified maps must be comparable with the ground-truth data. Based on the conducted review, overall accuracy, user accuracy, and the Kappa coefficient of the confusion matrix have been popular methods among scholars in this regard. However, there are some criticizes about the potential of the Kappa coefficient as an accuracy assessment method (Foody, 2020;Pontius & Millones, 2011). Kappa provides "corrects" for chance agreement while there is no need for it and the way chance, as an artificial construct, is modeled in the estimation of κ is incorrect. Moreover, Kappa fails to convert the sample confusion matrix into an estimated population matrix (Foody, 2020;Pontius & Millones, 2011). Reported overall accuracies usually ranged from 80% to 90% among the investigated publications. Several investigators argue that there must be the accuracy of at least 85%, while it is not necessary to have a universal standard or a definition of a good accuracy result because this is not necessarily connected to the purpose of the specific research question (Pontius & Millones, 2011;Pontius et al., 2007). The review also illustrated that the number of classes, landscape types, and classification algorithms has an impact on classification accuracy (Congalton & Green, 1999;Olofsson et al., 2014). For instance, Jin and Fan (Jin & Fan, 2018) obtained the highest overall accuracy of 97.90% by applying RF and Landsat images when they classified only five land cover types. In contrast, Li et al., (2017a) acquired the lowest accuracy rate of 75% when classifying 19 types of LU and LC. In general, it is difficult to compare different studies together and state which classification system or sensor is better. However, analyzing different works can provide useful information. Figure 5 represents the average result of overall accuracy based on sensor types and classification methods among the reviewed papers. Among the classification methods, the DNN (close to 91.5% on average) approach reported the highest result. Meanwhile, among sensors, the airborne (92% on average) performed the best.
As shown in Figure 6, the following are some issues in the accuracy assessment section of LULCC in the Economic Corridor based on the conducted literature review: (1) There is no specifically acceptable accuracy assessment metric. The overall accuracy of generated maps, in most cases, was the main concern of the authors, while the poor accuracy of rare classes was common among the studies.
(2) Researchers frequently fail to describe the validation technique in sufficient detail, which prevents readers from knowing how they carried out their research. (3) Some investigators are interested in temporal changes, but scholars usually reported the accuracy at individual time points, without indicating the accuracy of changes. (4) Lack of applying Geometric mean of user's accuracies and the geometric mean of producer's accuracies, which are particularly suitable for LC/LU accuracy evaluation of minority classes (Congalton, 1991), in the accuracy assessment section of generated LU and LC maps.

Evaluation of LULCC studies and trend in the economic corridor
The study focuses on 350 publications, including 10 books, 25 conference papers, and 315 peer-reviewed journal articles, covering land use/land cover monitoring in the Economic Corridor. Statistical analyses represented the main theme of the articles in 10 journals (Figure 7), with Remote Sensing (9%), Environmental Monitoring and Assessment (7%), Figure 6. A word cloud of main issues in accuracy assessment section (the more frequent the term appears within the analysis, the larger the word depicts in the figure). and Arabian journal of geoscience (6%) being the top three journals in this regard. Among meetings and conferences, the "IEEE International Geoscience and Remote Sensing Symposium" and the "International Conference on Geo-informatics" provided the most significant contributions. Among the publications, China was selected as the case study area in 105 (42%) of the publications, while Turkey (78) and Iran (72) occupied second and third place, respectively (see Figure 8). Furthermore, the publications were mainly associated with Chinese organizations and institutions such as the "National Natural Science Foundation of China", the "National Basic Research Program of China", and "Chinese Academy of Sciences (CAS)." More than 70% of the publications involved international cooperation with Chinese scholars making the highest contribution. Scientists and scholars from Canada and the US also made significant contributions in the publication of LULCC studies in the Economic Corridor.
Based on the conducted research in the web of knowledge, the number of publications experienced significant growth on an annual basis, from less than 20 papers in 1995 to more than 500 in 2019. In terms of the research area, from 2007 onwards, numerous part of publications focused on the relationship between LULCC and environmental issues (e.g., water management and climate change), and the extraction of individual classes (e.g., forest change and wetland changed). Meanwhile, before 2007, a large portion of studies focused exclusively on the dynamic monitoring of LULCC.

LULCC trend and main environmental problems in economic corridor
Although many studies have been conducted on LULCC in the Economic Corridor, only a few of them have studied the trend of LULCC on large or national scales (Amani, Ghorbanian, Mahdavi, & Mohammadzadeh, 2019;Hu & Hu, 2019;Klein, Gessner, & Kuenzer, 2012;Li et al., 2018Li et al., , 2017aSodango, Sha, & Li, 2017). In most cases, scholars have selected small spatial and temporal scales in their studies. However, the overall pattern of LULCC in the Economic Corridor can be made based on the summary of conducted studies that might not be precise and hard to apply for policymaking purposes. As is clear from Figure 9, in general: (1) Climate change, urbanization, industrialization, political collapse of the Soviet Union, and human forces have been recognized as major causes of LULCC in the study area.
(2) LULCC has, directly and indirectly, caused some environmental issues, such as biodiversity loss, air pollution, water pollution, desertification, and land degradation in the study area. Among them, air pollution and water pollution are two major environmental issues that almost all areas of the Economic Corridor are facing. Another main concern in this region is water resource management, particularly in Iran and Central Asia. For example, the Aral Sea in Central Asia and Lake Urmia in Iran are two important water bodies in the study area, which are considered as endangered ecosystems (Aghakouchak et al., 2015). (3) The main land change processes in the study area were reported as barren to cropland, cropland to build-up, and forests to croplands. For example, in the case of China, a massive expansion of cropland areas and a significant reduction of forest lands have been reported (Li et al., 2018(Li et al., , 2017a. Regarding countries from Central Asia (Kazakhstan, Tajikistan, Uzbekistan, and Kyrgyzstan), a downward trend of bare land areas and an upward trend of natural vegetation areas (forest, shrubland, and grassland) have been reported (Hu & Hu, 2019;Kharin, 1994;Orlovsky, Raalzinsky, & Orlovsky, 2001).

Summary and recommendations
The present review was an attempt to evaluate LULCC monitoring studies in the Economic Corridor by providing an overview of current advances, limitations, and challenges. Moreover, a general summary of the LULCC trend and the main environmental issues were provided. It was observed that changing LC types have a long heritage in the study area because of a series of factors, such as industrialization, economic development, urbanization, intensive cultivation, and climate change. In this context, the main land change processes were related to barren to cropland built-up, and forests to croplands. Our analysis showed that there is still a debate over the selection of the right sensor type and classification method. In fact, there is no choice about "fit for all areas", because different sensors and classification methods can provide different results for different landscapes, particularly where the area is big and consists of various types of landscapes. Applying ML algorithms for LULCC modeling was an issue that strongly attracted scholars in the study area. However, there is a shortage of reliable reference data for the Economic Corridor, when training data and reference data play important roles in ML classification because classes are classified based on training data. To address the aforementioned issues, a wide variety of projects, strategies, and models have been developed and applied to LULCC monitoring in the study area.
In conclusion, on the topic of decreasing the influence of existing challenges for monitoring LULCC, we present the following recommendations: • Since there is a lack of sufficient reference data for generating long-term LC maps in most parts of the Economic Corridor, especially countries from Central Asia. It is suggested to apply reference data migration technique to migrate high-quality reference samples from one year to another year. For instance, Ghorbanian et al., (Ghorbanian et al., 2020) used a reference data migration technique and provided a 10 m resolution LC map over Iran. • Using public sample collection platforms, such as Geo-Wiki which seeks to provide reliable and sufficient LC/LU data for calibration, training, and validation can be counted as another efficient way to decrease the challenges of reference data in the Economic Corridor. • Although the results do not clearly illustrate the optimal classification method in the study area, applying semi-supervised learning methods can be an efficient method for the study area where there is a lack of sufficient reference data. Semi-supervised learning, combining unsupervised and supervised classification techniques, uses a small amount of labeled data with a large amount of unlabeled data. • As the Economic Corridor encompasses different landscapes, the integration/combination of different classification techniques in the study area looks necessary. Therefore, further investigations on hybrid methods in the classification section are also recommended. • Since balancing data has been rarely considered in the literature review, applying balancing data methods is recommended to handle the class imbalance problem. Additionally, in terms of accuracy assessment, it is suggested to apply the geometric mean metric, which is particularly suitable for LC accuracy evaluation with the class imbalance problem. • Since most areas of the Economic Corridor have four seasons, it is likely to misclassify some classes such as croplands and natural vegetation, because they share similar spectral features in peak growing seasons. Therefore, using time-series of satellite images can lead to higher accuracy of those classes (Feng et al., 2019).