Greenhouse area detection in Guanzhong Plain, Shaanxi, China: spatio-temporal change and suitability classification

ABSTRACT The extensive use of greenhouses has brought soared economic benefits for farming practitioners in China and an overview of the spatio-temporal distribution of greenhouses is of great interest to agricultural practitioners and decision-makers. In this study, Landsat image based greenhouse maps in Guanzhong Plain, Shaanxi, China were made using random forest classification algorithm through visual interpretation on the Google Earth Engine. The 7-year's changes in greenhouse areas were investigated (i.e. 2000, 2003, 2006, 2010, 2013, 2015 and 2019) with yearly overall accuracy more than 90%. The results showed that the total area of greenhouses in Guanzhong Plain demonstrated an increasing trend, from 5.92 km2 in 2000 to 194.42 km2 in 2019 with a considerable growth between 2010 and 2015. The dominant drivers for the increase are largely attributed to the government policy as well as economic profitability. The distribution of greenhouse shifts to central and eastern regions of Guanzhong Plain. Greenhouses preferentially expand to the area near to rural roads, main rivers, and high elevation, with more than 45% greenhouses distributed within 1 km of the county rural road. The principal component analysis based suitability evaluation showed that a total of 38.44% of the area was suitable for greenhouse.


Introduction
Greenhouse is a structure with transparent walls and roof that artificially adjusts the light, temperature, and water environments to fit plant growth even the ambient environment is not suitable for it. It enables the full use of available resources to shorten the growth period, to increase the yield and quality of agricultural products, and to increase economic benefits (Lamont 1999). Driven by the increase in the food demand and the progress in agricultural production modes and their upgrades, greenhouses are getting increasingly popularized around the world due to its special capability to meet the food and economic demands. However, they also cause a series of problems threatening agricultural and ecological environments. For instance, intensified applications of manure and composts lead to an increased leaching of nitrogean (N) and phosphorus (P), decreased soil organic matter concentrations, lowered soil pH, and soil salinity that affect the sustainability of organic cropping systems (Zikeli, Deil, and Möller 2017).
The Guanzhong Plain is the birthplace of Chinese agriculture (Yang and Cheng 1986) and among the most ancient agricultural regions in China (Zheng et al. 2020). It is well known as '800 li Qinchuan' (large area of plains favorable for crop growing) (Li et al. 2014) because of its favorable natural climate conditions and flat topography. It has been among the most important grain production area in Shaanxi province even the whole China (Huang, Wang, and Li 2011;Shen et al. 2020). Traditional cropping system of the region is dominated by rotated summer maize and winter wheat (Zheng et al. 2020). Since the beginning of 21 st century, greenhouses have been introduced as a technology of producing agricultural products and vegetables. The area of greenhouses has been expanding extensively since then; however, the temporal-spatial dynamics of greenhouse is understudied.
Remote sensing provides a unique opportunity with great convenience for monitoring the land use change based on its large spatial coverage and short temporal interval for revisiting. The introduction of cloud-based platform such as Google Earth Engine (GEE) enables geospatial analysis in a larger scale with increased efficiency (Dong et al. 2016). Unlike general remote sensing software tools that require local storage and processes, GEE provides a cloud platform that can obtain large volume open source geospatial data and perform the image processing in an effective way (Gorelick et al. 2017;Amani et al. 2020). Meanwhile, GEE makes long time periods monitoring land use types more easily (Sidhu, Pebesma, and Câmara 2018;Zurqani et al. 2018;Daldegan, Roberts, and de Figueiredo Ribeiro 2019;Tsai et al. 2019;Xie et al. 2019;Zhou, Dong, et al. 2019;Paludo et al. 2020;Wang et al. 2020). This study adopted GEE to study the spatio-temporal changes of greenhouse in Guanzhong Plain.
Nowadays, many remote sensing approaches have been successfully applied to greenhouse detection, including visual interpretation Wei et al. 2019), maximum likelihood classification (Agüera, Aguilar, and Aguilar 2007;Agüera, Aguilar, and Aguilar 2008), and support vector machine (Yu, Song, and Lang 2017). In addition, there are some researches identifying greenhouses based on the object-based image analysis (OBIA), including nearest neighbor (NN), threshold model, and decision-tree and random forest (RF) (Tarantino and Figorito 2012;Aguilar et al. 2014;Novelli et al. 2016;Ji et al. 2019;Zhou, Fan, and Liu 2019). Most of these studies on greenhouse detection generally focused on a relatively small area. Evaluation of several methods in Shandong Province, China showed that the random forest performed best in GEE (Zhu et al. 2020a) and is used in this study.
Previous studies mainly investigated accuracy of greenhouse detection, and the spatial distribution of greenhouse in Guanzhong Plain was not monitored over a long period. In addition, the site selection criteria were not connected to greenhouses and few researches studied the location condition, economic driving factors, and suitability analysis. Therefore, this study explores the influence of driving factors and presents a suitability analysis for greenhouse in Guanzhong Plain. The objectives of this study were as follows: (1) to extract temporal change in greenhouse area in Guanzhong Plain on GEE platform using Landsat images and the pixel-based random forest algorithm; (2) to understand the spatial distribution pattern of greenhouses, including the area and site selection criteria of greenhouses as well as economic driving factor of greenhouses' area; and (3) to make a suitability analysis for greenhouse in Guanzhong Plain.

Study area
The Guanzhong Plain is located in the central Shaanxi Province, China (33.95°-35.65°N, 106.37°-110.4°E) (Figure 1), covering the majority of Xi'an, Baoji, Xianyang, Tongchuan, and Weinan with a total area of approximate 36,000 km 2 . It is characterized with continental monsoon climate in the warm temperate zone, with annual average rainfall ranging from 500 to 800 mm, and the multipleyear average temperature is approximately 13°C (Tian et al. 2021). The first-level branch river (Weihe river) of the Yellow River winds along the Guanzhong Plain to provide water resources for drinking and irrigation. The geographical advantages together with fertile soil make it one of the most important agricultural areas in Shaanxi Province and in China from ancient times. Greenhouse can solve the low-temperature problem in winter to keep vegetables growing normally. In the study area, greenhouse is mainly introduced to grow vegetables and fruits and developed vigorously. Greenhouses are mostly covered with white plastic (Figure 2), e.g. polyolefin (PO) film of 0.08, 0.1 and 0.12 mm in thickness. Solar greenhouses are generally covered with a film of 0.1 mm, and plastic greenhouses are covered with a film of 0.1 mm or 0.12 mm. Compared with previous polyethylene (PE) film, PO film is characterized with better transparency, stronger fog dissipation, anti-dropping, tensile strength, tear strength, and anti-static ability, which are mainly needed for planting vegetables and fruits. Only limited number of greenhouses adopted PO film of other colors in Guanzhong Plain; therefore, we focused on extracting white plastic greenhouses in this study.

Dataset
Multi-period images used in this study offer a unique opportunity to achieve long-term greenhouse monitoring and guarantee the accuracy of classification. The Landsat series of satellites provide long-term terrestrial data over 40 years and have been widely used for mapping various thematic maps (Roy et al. 2014). Landsat imagery was used to investigate the greenhouses' dynamic change from 2000 to 2019 in this study. The 30 m Landsat images for the study area were obtained from the GEE catalogue (Gorelick et al. 2017), including Landsat 5, Landsat 7, and Landsat 8 images. The Landsat 5 satellite operated in the orbit for 27 years from 1984 (Kovalskyy and Roy 2013). The Landsat 7 satellite was launched in April 1999, carrying an enhanced thematic mapper (ETM+) sensor. It features a panchromatic band at 15 m resolution (Gorelick et al. 2017) (Table 1). The Landsat 8 satellite was launched in 2013 and continued working till now. Compared to the previous Landsat series, it carries two additional spectral bands: the blue and cirrus cloud with improved sensor signal-to-noise performance (Roy et al. 2014). The bands of Landsat image are presented in Table 1. The red, green, blue, NIR, SWIR 1, SWIR 2, and TIR bands for Landsat 5; the red, green, blue, NIR, SWIR 1, and SWIR 2 bands for Landsat 7; and the red, green, blue, NIR, SWIR 1, SWIR 2, and coastal aerosol bands for Landsat 8 were used for classification in this study.

Data preprocessing
2.3.1. Data acquisition and processing As only few greenhouses were available, they cannot be distinguished on images in 1990 and 1995, and we studied the dynamic change of greenhouses from 2000. The Landsat Collection 1 Tier 1 TOA Reflectance images for 7 years between 2000 and 2019 (i.e. 2000, 2003, 2006, 2010, 2013, 2015, and 2019) were used for this study, which have been pre-processed and corrected at TOA reflectance (Chander, Markham, and Helder 2009). Compared to surface reflectance images, the spectral characteristics of greenhouse can be clearly showed on the TOA Reflectance images. Landsat 7 imagery scenes were used for 2000 because the Landsat 5 images were cloudier in study area in 2000. Landsat 5 images of years 2003, 2006, and 2010 were selected as the Scan Line Corrector of  Landsat 7 broke down in May 2003 (Markham et al. 2004). Landsat 8 images of 2013, 2015, and 2019 were used as Landsat 8 satellite was launched on February 11, 2013. Combined use of Landsat 5, Landsat 7, and Landsat 8 images for greenhouse extraction can ensure consistent spatial resolution of the data and continuous temporal coverage. A total of seven Landsat imagery scenes are required to cover the study area, the Guanzhong Plain ( Figure 3).
Winter is the frequently operating season for the greenhouse to meet market demand and to maximize economic profits. Greenhouses are covered with sunshade net for cooling and plastic films are rolled up for ventilation in summer, which make it challenging to be detected by satellite. Therefore, greenhouse images from October of the study year to February of the following year were selected for the analysis (Table 2). For example, imagery scenes for 2000 were selected from the best images between October 2000 and February 2001. Only wheat was growing in the field during this period and the wheat usually did not need to be covered with mulch and hence there would not be interference of mulch. Each image was selected based on the 'CLOUD_COVER' to get the least cloud cover image to eliminate the effect of cloud (Duan et al. 2020). RmCloud function was also used for removing cloud region. Simultaneously, seven images were mosaiced together and clipped through the zonal boundary of the study area. In addition, greenhouses are generally distributed in areas with low altitude. Thus, elevation was used to remove the wrongly classified areas.

Classification method and verification
The RF is an ensemble machine learning approach that consists of a large set of decision trees to make a prediction (Breiman 2001). The corresponding trees are generated by sampling the same size training sample sets randomly through replacement from the original training sample set.  An optimal attribute class split node is selected, and the voting method is used to predict the combination of multiple decision trees (a bagging approach) (Belgiu and Drăguţ 2016). About two-third bootstrap of the training set are used for training, while the remaining one-third (out-of-bag) are used for the error estimation during the training process (Breiman 2001). The features used in RF classification are 100 trees, one min leaf population, 0.5 bag fraction, and max nodes are used without limit and the variables presplit are default and seed is zero. The distribution of greenhouses and the spectral characteristics of greenhouses were validated by field survey for 2019 and virtual exploration using Google Earth ( Figure 4). Greenhouse samples of 2019 were drawn by visual interpretation based on Landsat images and Google Earth. Two classes included greenhouses and non-greenhouses (water, vegetation, bare land, impervious surface) were randomly created using polygons in 2019. Among all the sample data, 70% of the training samples were used for the classification and 30% were used for validation and accuracy assessment (Ou et al. 2020). The number of greenhouse samples equals other class samples was, respectively, 670, 580, 495, 248, 172, 150, and 85 in 2019, 2015, 2013, 2010, 2006, 2003, and 2000. The overall accuracy, producer's accuracy and user's accuracy, and kappa coefficient were calculated to evaluate accuracy in GEE. Producer's accuracy indicates the percentage of the number of correctly classified samples to the total number of reference samples. User's accuracy represents the probability taking any random sample from the classification results, and its type is the same as the actual type of the ground (Story and Congalton 1986). Owing to the lack of field survey data, greenhouse samples of 2000, 2003, 2006, 2010, 2013, and 2015 were drawn on Landsat images based on the visual interpretation with Google Earth. These samples were also divided into the training and validation data to classify and evaluate accuracy in the same manner as 2019. It is noteworthy that user memory will exceed limit when the number of samples or decision trees are too large.

Data postprocessing
After all the classification is done, these images were imported into ArcGIS 10.2 for further processes. The tool of majority filter was used to remove some tiny patches, and the area tabulate was used to get the area of greenhouses. 'Four' was used in the number of neighbors, and the replacement threshold selected 'majority' in majority filter's parameter setting.

Spatial structure analysis
Fractal dimension (D), comes from the concept of geometry (Mandelbrot 1977), now has been used as an indicator to characterize the spatial structure of land use (Krummel et al. 1987; Batty and Longley 1988). The equation is expressed as follows (Yu and Guo 2005): where A indicates the area of greenhouse pattern, P is the perimeter of greenhouse pattern, C is a constant, and D is fractal dimension, with the value varying between 1 and 2. A greater value of D indicates a more irregular of land type patch and the more complex of land type structure (Zhao and Wang 1995). When D equals 1.5, it represents that the patches are in the Brownian random motion state, the spatial structure of pattern is in the most unstable state . Land use stability index (SK) indicates the stability of land use structure, and a larger SK indicates the more stable of land use structure. SK is defined as follows: (2)

Location driving factors analysis
To explore the potential influence between driving factors and greenhouse patches in Guanzhong Plain, eight potential driving factors were selected for binary logistic regression, including national roads, provincial roads, county roads, rural roads, main rivers, settlements, slope, and elevation. The digital elevation model (DEM) was obtained from geospatial data cloud (http://www. gscloud.cn/), and shapefiles of primary roads and settlements (2015, 1:250000) were obtained from national catalog service for geographic information (http://www.webmap.cn). The four types of roads are obtained through attributes. The shapefiles of main rivers were obtained from openstreetmap (http://www.openstreetmap.org) in 2019 because river data for 2015 were unavailable. Logistic regression analysis analyzes effects of dominant variables for the occurrence of certain spatial phenomena (Wu, Yeh, and On 1997;Cheng and Masser 2003;Aspinall 2004). Considering the scale of Guanzhong Plain, all data were transformed to grid format with a 100 × 100 m cell size for logistic regression analysis. Based on the previous study (Batisani and Yarnal 2009), 25% of the observation samples were used to reduce the potential effects of spatial autocorrelation in the logistic regression.
In this logistic regression, the potential driving factors were used as independent variables and the reclassified greenhouse map as the dependent variable. The goodness of fit for regression models can be evaluated by the relative operating characteristic (ROC) (Lin et al. 2011). ROC values show whether spatial patterns of the land type can be reasonably explained by the independent variables (Oh et al. 2011). When ROC values are greater than 0.7, the accuracy of a model can be considered to be credible (Jiang et al. 2015). The model's precision is high if ROC is greater than 0.9 (Manel and Ormerod 2010).
To explore the site selection of greenhouse, we analyzed the distribution of greenhouses based on different buffer zones and different slopes and elevations. Six main distance-based buffer zones including national roads (DIST_NR), provincial roads (DIST_PR), county roads (DIST_CR), rural roads (DIST_RR), main rivers (DIST_MR), and settlements (DIST_S) were taken as centers. Statistical analysis for other two greenhouse zones was based on different ranges of slope and elevation. The classification ranges is presented in Table 3.

Economic factor analysis
As economic factors of local area would be helpful to reveal the change of greenhouses in a long time series, the related factors combining the policy and economic information were therefore analyzed. The statistical information was obtained from the state and city level Bureau of Statistics (www.yearbookchina.com).
2.7. Greenhouse suitability analysis 2.7.1. Principal component analysis Principal component analysis (PCA) was developed by Pearson, a method using dimensional reduction processing to convert the multiple indicators into several comprehensive indicators that can reflect the research phenomenon to ensure the minimum loss of data information (Pearson 1901). Compared to other methods, PCA is an objective evaluation method without subjective consciousness through mathematical analysis and has been widely used in various fields (Ouiyangkul, Tantishaiyakul, and Hirun 2020;Cui et al. 2021;Yan et al. 2021). Based on the theory of PCA, the calculation steps are as follows: supposing that n samples in the evaluation system and each sample needs p evaluation indicators (x 1 , x 2 , … , x p ) to describe, and matrix X (n x p) was obtained (Li and zhang 2010).
First, matrix (X ) was standardized and correlation coefficient matrix (R) was calculated. Then eigenvalues (λ 1 , λ 2 , … , λ p ) and eigenvectors were obtained through the correlation coefficient matrix. Then, variance contribution rate, cumulative contribution rate (E), and principal component loadings were computed.
Generally, when cumulative contribution rate reaches more than 85%, the corresponding principal components were selected to represent the original information. All these computation was conducted in SPSS software. The second step is to determine indicator weight based on the PCA. First, the coefficients of each principal component in the linear combination of principal components (Nie 2017) were calculated by: where a is coefficients of linear combination, C is the loading of component, and λ represents the eigenvalues.
Then, variance contribution rates of multiple principal components were taken as weights, and a weighted average for the coefficients of these principal components linear combination was where b is the comprehensive score model indicator coefficient and β is the variance contribution rate. Finally, the indicator weight was obtained by normalizing the comprehensive score model indicator coefficient. In this study, eight factors that affect greenhouse distribution were used as eight variables to evaluate through PCA, and indicator weight value is generated by excel.

Greenhouse suitability assessment
To assess the areas that are suitable for greenhouse planting, the PCA method together with GIS technique was used to make land suitability assessment map for greenhouse. The other land use types (forest, wetland, water, tundra, impervious surface, bareland, grassland, and shrubland) that could not converted to greenhouse were merged as the not-suitable land at the land cover map in 2015 (http://data.ess.tsinghua.edu.cn/). Then weighted overlay analysis was used to overlay these layers by eight obtained weights parameters to generate the greenhouse suitability map. Based on land suitability classification (FAO 1977), four suitability classification including highly suitable, moderately suitable, marginally suitable, and not suitable were applied for the study area. The natural break (jenks) method was used for determining the class interval. The core idea of the natural breaks method is that similar values are used to best group to identify class breaks, maximize the dissimilarity between the external classes, and make the range and number of elements of each class as similar as possible (Esri 2014).

Accuracy assessment
The accuracy assessment results are presented in Table 4. The overall accuracy of each year and the kappa coefficient is equal to or greater than 90%, which is expected to meet the requirements for quantifying greenhouses' dynamics distribution. The other four classes were considered as nongreenhouse, and F1-score (Table 5) was calculated for each year based on two classes (greenhouses and non-greenhouses). The values of F1-score are high in 6 years (2003, 2006, 2010, 2013, 2015, and 2019), which indicates that a good quality of mapping results. The F1-score is low in 2000, depicting a low quality of the mapping results.

Area change of greenhouses
Based on the multi-temporal greenhouse maps, the area change of greenhouse were analyzed. The entire area of the greenhouses tended to rise and increased by 169.64 km 2 between 2000 and 2019.

Spatio-temporal distribution of greenhouses
The multi-temporal greenhouse maps were obtained with the RF classification algorithm during 2000-2019 in Guanzhong Plain. The spatio-temporal distribution of greenhouse maps in a specific area is shown in Figure 6(h). Figure 6(a)-(g) show the expansion process of greenhouse from 2000 to 2019, and we found that the distribution of greenhouses shifts to the eastern and central regions. The change of greenhouse distribution mainly related to governments' policy in spatial.

Spatial structure analysis results
The greenhouse patches in different years were extracted from classification maps to calculate the area and perimeter of greenhouse patch. The linear regression analysis was executed to obtain linear regression equation in SPSS software. The correlation coefficient (R 2 ) in regression analysis, fractal dimension (D), and stability index (SK) are shown in Table 6. The correlation coefficients (R 2 ) are all above 0.9 (Table 6) in each year investigated, which shows that the effect of regression equation is favorable and fractal dimension can be used to study the spatial structure of greenhouses. The fractal dimension of greenhouse is generally constant with small fluctuation, which indicated that the greenhouses were affected by human activities in certain extent. This leads to land fragmentation and makes the border of greenhouse patches more irregular. Meanwhile, the change of greenhouses also contributes to the complexity and instability of land structure; therefore, the stability index (SK) decreased during 2015 and 2019 (Table 6).

Logistic regression analysis results
The ROC value is greater than 0.752 (Figure 7) in the relative operating characteristic test, suggesting that our analysis of driving factors was credible. The sig value of eight driving factors was less than 0.05 in the logistic analysis. The analysis results show that all these factors are associated with greenhouse farm. The distance to the rural road, main river, and elevation had positive effects on greenhouse expansion based on the positive regression coefficients values, while the remaining four driving factors had negative effects ( Table 7). The positive regression coefficients indicated that the greenhouses preferentially extend in the areas near to the rural road and main rivers so that the management and irrigation of greenhouses are possible. The distance factors such as the distance to the rural road and other roads influence transportation of vegetables to market (Hu and Lo 2007). However, distance to rivers affects the vegetable water requirement (Yu, Song, and Lang 2017). More and more greenhouses were constructed in the areas with high elevation due to the government support projects (qhnews 2007;Daily 2015;iqilu 2019;Mingbao 2021). The negative effects of the distance factors showed that greenhouses expand in the areas close to national road, provincial road, county road, settlements, and areas with high slope. In addition, the increased slope corresponds to declined greenhouses, which were determined by the topography factors. This is understandable because the steep slope would increase the cost of construction and transportation and be prone to water and soil erosion.

Buffer zones analysis results
The buffer zone analysis and greenhouse distribution under different slopes and elevation were conducted to study the site selection of greenhouses based on eight selected factors. The analysis of six main distance-based buffer zones that centered the primary roads, main rivers, and settlements shows that the greenhouses are mostly distributed within the 0-1 km on the sides of the county and rural roads, which account for 45% or more ( Figure 8). Sixty-one percent of the greenhouses were distributed in the national roads and 44% of the greenhouses were built in provincial roads within 0-4 km, which helps managers transport agricultural products to outside handily. Forty- 1.826E-50 Abbreviations: Factor 1, Factor 2, Factor 3, Factor 4, Factor 5, Factor 6, Factor 7, and Factor 8 represent the distance to national road, the distance to provincial road, the distance to county road, the distance to rural road, the distance to main river, the distance to settlement, and slope and elevation, respectively. Figure 8. The ratio of greenhouses in different grades, including buffer zones, slope and elevation: national roads (Area_NR), provincial roads (Area_PR), county roads (Area_CR), rural roads (Area_RR), main rivers (Area_MR), settlements (Area_S), slope (Area_Slope), and elevation (Area_Elevation).
nine percent of the greenhouses were distributed in settlements within 0-2 km (Figure 8). It was convenient for farmers to manage their agricultural crops. Sixty-seven percent of the greenhouses were distributed around the main rivers within 0-2 km (Figure 8) to keep sufficient water source. More than 90% of the greenhouses distributed in slope less than 15°and within 500 m. It indicated that the convenience of production and transportation and suitable topography are the most important factors for greenhouse site selection (Ou et al. 2020).

Analysis results of economic drivers
In terms of population and demand for greenhouse-grown vegetable, the total population of study area was 21.51 million in 2000, and it increased to 24.59 million in 2019 based on the statistical information. In addition to the increase in population, the vegetable yield has been increasing over the investigated period at a similar pattern as the greenhouse area ( Figure 9). As a part of modern agriculture, greenhouse production really played an essential role in vegetable yields in whole Guanzhong Plain even a district included (Cui 2016). With the development of facility agriculture on the Tibetan Plateau, the vegetable self-sufficient rate improved 8% between 2008 and 2018 , which also validated our opinion. The greenhouse production can ensure the supply of vegetable production in a certain extent that drives the increased area of greenhouses. Per-capita GDP of Guanzhong Plain has been increasing gradually from CNY 4,171 (US $∼641) in 2000 to 44,100 CNY (US $∼6,78) in 2019 ( Figure 10). The farmer's per-capita income presented an improvement trend during the 2000-2019 period (Figure 10). On the whole, Guanzhong Plain's economy showed a trend of stable increase. Meanwhile, the farmer's income also improved steadily ( Figure 10). It illustrated that greenhouse planting could also drive the farmers' income and increase the farmers' income. For example, the vegetable industry presented a good development trend in many counties, such as Qianxian county, Jingyang county and Hancheng city (Facility vegetable 2019; Vegetable industry 2019; Vegetable 2021). The per-capital income of Baiyang village in the vegetable industry was more than 10,000 CNY in 2021 (Jingyang county 2021). These scenes happened not only in the study area but also in other greenhouses planting regions (Wu and Zhang 2013). It showed that greenhouse production really promote the average annual income of farmers ( Figure 11).

Principal component analysis results
In this study, the PCA and GIS approach were integrated to assess the land suitability for greenhouse. The relative weight of eight parameters including six distance factors (i.e. national roads, provincial roads, county roads, rural roads, main rivers and settlements) and two topographic factors (i.e. slope and elevation) were obtained using the PCA method (Table 8). The Barlett sphericity test and KMO test were performed. The results of the Barlett sphericity test was 0.000, which was less than the significance level of 0.05, and the KMO statistic was 0.802, which was greater than 0.5, which proved to be applicable to PCA. The eigenvalues and contribution rates of each principal component were obtained (Table 8) through the principal component analysis.
The first six principal components contained about 85% of the original variables information (Table 8), which can basically summarize the information of total variables so these six components  were selected for the analysis. Table 9 presents the information of each principal component loading. The larger the corresponding loading cofficient, the more information that contains the original variable. The elevation, slope, the distance to county road, rural road, national road, and provincial road had a large contribution to the first principal component, which indicated the topography and transportation have a huge effect on greenhouse distribution. The distance to main rivers had a contribution to the second principal component. It showed that water facility affects the greenhouse farm. The distance to settlement, provincial road, and national road made contribution to third, fourth, and fifth principal components. All first six principal components can reflect the total information of eight variables.

Suitability analysis results
Based on PCA, the indicator weights used to generate the suitability map of greenhouse are presented in Table 9. It is noted that distance to county roads, provincal roads, and main rivers has relatively higher weights, while distance to national roads, rural roads, and settlements is assigned with lower values (Table 9). In addition, slope and elevation also have a high weight, which indicated they are closely related to the distribution of greenhouse. Four suitability classifications including highly suitable, moderately suitable, marginally suitable, and not suitable were displayed with different colors in suitability map, and the light color shows low suitability and dark color shows higher suitability ( Figure 12).
According to the suitability map, 16.72% of the area was highly suitable, 14.99% was moderately suitable, 6.73% was marginally suitable, and the rest was not suitable in study area ( Figure 12). The unfavorable slope and elevation (Akıncı, Yavuz Özalp, and Turgut 2013) (slope > 25°and elevation >1500 m) and the land with large sections of forest were the key factors restricting agriculture development in parts of the northern and southern regions ( Figure 12). It is true that flat topography plays an important role for agroforestry (Nath et al. 2021) and rapeseed farming's cropping (Ostovari et al. 2019). The moderately suitability areas are mostly observed in the central and eastern parts of the study area (Figure 12), where are dominated by the agricultural land. The greenhouse maybe converted from farmland to pursue a higher agricultural production and economic benefits.
The results obtained from suitability map were compared with the greenhouse areas from the classification result (Figure 6(f)). It is found that 65.06% of the greenhouse distribution is highly suitable, 24.35% is moderately suitable, 3.05% is marginally suitable, and 7.54% is in not suitable. The majority of the greenhouses was in highly and moderately suitable areas, and only limited greenhouses were distributed on lower suitability land. It is noted that the suitability area generated by integrated PCA and GIS approach (Figure 12) is greater than the detected greenhouse areas in 2015 ( Figure 6(f)). This might indicate that the farmers were not aware of utilizing the suitable areas for greenhouse farming due to the lack of guided land use policies for example (Ramamurthy, Obi Reddy, and Kumar 2020) or some farmer bias even the large input demands (Muzira et al. 2021). Therefore, to assess suitability before cropping is also required to increase productivity and reduce potential losses (Muzira et al. 2021). This study provides critical information for the decision makers of greenhouse development in Guanzhong Plain.

Advantage and limitation of GEE
Compared to conventional image processing software, GEE achieved high-speed online visualize computation and analysis and accomplished time series greenhouse mapping, which greatly saved the time and cost. However, there are some limitations during the classification, such as image segmentation that cover a large area could not be performed (Ou et al. 2020), even the large decision trees could also appear error (computed value is too large). This phenomenon also affects the classification efficiently.

Assessment of sample in regression analysis
According to previous studies (Verburg et al. 2002;Wassenaar et al. 2007;Batisani and Yarnal 2009;Lin et al. 2011;Wagner and Waske 2016), 25% of samples were randomly selected to perform the binary logistic regression to decline the effects of spatial autocorrelation in this study. However, by using the sampling method to assess the robustness of regression results is problematic, different set of samples in regression might lead to slight difference in results further to affect the analysis of driving factors (Yu, Song, and Lang 2017). Therefore, robust assessment of regression results that using a range of samples should be considered in the future study. In addition, the better regression results in different simulate scale should be further explored based on the raster size.

The significance of greenhouse extraction
Guanzhong Plain has the unique advantages of agricultural resources and agricultural industry development, and hence, the multi-temporal greenhouse maps facilitate regulation (https:// gao123gao.github.io/) and management of regional greenhouses (Shaanxi 2008a;Zou et al. 2014;Gao et al. 2018;Chen et al. 2019;Zhao, Ren, and Yang 2019). Some useful information can be learned timely such as the area (Zhu et al. 2020b) and spatial change of distribution features (Hasituya, Inoue, and Thenkabail 2017; Ou et al. 2020;Yang et al. 2021). Besides, some specific environmental condition about greenhouse (Hu et al. 2012;Zhang et al. 2015;Lv et al. 2019;Ma et al. 2019;Fan et al. 2020;Feng, Lu, and Liu 2021) can provide useful data for local agricultural authorities to achieve the sustainable development of agriculture (Chang et al. 2011;Guo et al. 2021;Zhou et al. 2021), such as the management of nutrient (Hasituya et al. 2016), N (Song et al. 2009), di-n-butyl phthalate (DnBP) and bis-(2-ethylhexyl) phthalate (DEHP) (Ma, Chu, and Xu 2003;Xu, Li, and Wang 2008;Chen et al. 2011), limitation of total energy consumption in greenhouses , reduction CO 2 emission (Baddadi et al. 2019), and the use of pesticides and fertilizers through multi sensor decision fusion (Aiello et al. 2018).

Future works
As a typical agricultural land, greenhouse plays an increasingly important role in agricultural production and our study would reveal some useful features guiding the practitioners and policy makers. Future work focusing on the method to distinguish greenhouse and mulch should be performed for areas with both greenhouse and mulch. In addition, the higher-resolution image and popular classification method (e.g. object-oriented classification, machine learning) should be used for greenhouse detection to improve the classification accuracy. Furthermore, greenhouse detection in a larger region (e.g. national, continental or global scale) would be of greater interest.

Conclusions and perspectives
In this study, we got multi-temporal greenhouse maps during 2000-2019 using RF classification algorithm by GEE platform and the kappa coefficient, and overall accuracy both approximated 90%. The land suitability assessment for greenhouse was conducted by PCA and GIS methods. These findings showed that using this method to extract greenhouses was quite promising in the medium scale region, and greenhouses were generally converted from highly suitable cropland.
This study also provides critical information for future development of greenhouse in Guanzhong Plain, Shaanxi, China.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data were derived from the following resources available in the public domain: Landsat data from Google Earth Engine: (https://code.earthengine.google.com/). DEM, shapefiles of primary roads and settlements (2015)