Mapping essential urban land use categories (EULUC) using geospatial big data: Progress, challenges, and opportunities

ABSTRACT Urban land use information that reflects socio-economic functions and human activities is critically essential for urban planning, landscape design, environmental management, health promotion, and biodiversity conservation. Land-use maps outlining the distribution, pattern, and composition of essential urban land use categories (EULUC) have facilitated a wide spectrum of applications and further triggered new opportunities in urban studies. New and improved Earth observations, algorithms, and advanced products for extracting thematic urban information, in association with emerging social sensing big data and auxiliary crowdsourcing datasets, all together offer great potentials to mapping fine-resolution EULUC from regional to global scales. Here we review the advances of EULUC mapping research and practices in terms of their data, methods, and applications. Based on the historical retrospect, we summarize the challenges and limitations of current EULUC studies regarding sample collection, mixed land use problem, data and model generalization, and large-scale mapping efforts. Finally, we propose and discuss future opportunities, including cross-scale mapping, optimal integration of multi-source features, global sample libraries from crowdsourcing approaches, advanced machine learning and ensembled classification strategy, open portals for data visualization and sharing, multi-temporal mapping of EULUC change, and implications in urban environmental studies, to facilitate multi-scale fine-resolution EULUC mapping research.


Introduction
Land use/land cover (LULC) has been considered one of the most important factors that reflect the underlying natural and anthropogenic processes of global environmental change (Foody, 2002;Herold, Mayaux, Woodcock, Baccini, & Schmullius, 2008;Running, 2008;Turner, Lambin, & Reenberg, 2007). It has been widely recognized that LULC pattern and changes affect and link with many aspects of ecosystem function and human health (Foley et al., 2005) and play a critical role in modulating climate change (Feddema et al., 2005;Findell et al., 2017;Pielke Sr et al., 2011), maintaining ecosystem (Verburg, Van De Steeg, Veldkamp, & Willemen, 2009), conserving biodiversity (Gámez-Virués et al., 2015), and promoting planetary health (Rockström et al., 2009;Whitmee et al., 2015). Accurate and timely classification, mapping, and monitoring of LULC and its changes are of great importance to a broad spectrum of research themes in Earth observation and global environmental change since they can provide essential knowledge for understanding the Earth's dynamics.
Satellite and airborne remote sensing have greatly facilitated LULC monitoring by providing spatially explicit and temporally continuous information from regional to global scales (Gong et al., 2013;Hansen et al., 2013). In recent decades, with the development of new satellite sensors, improved classifiers, machine learning algorithms, and cloud computing, many remote sensing based LULC classification products have been produced and publicly available at multiple spatial resolutions. Representative datasets include (i) coarse-resolution ones from 100 m to 1 km, e.g. the 1-km International Geosphere-Biosphere Programme data and information system cover (IGBP-DISCover) map (Loveland et al., 2000), the 1-km University of Maryland (UMD) land cover map (Hansen, DeFries, Townshend, & Sohlberg, 2000), the 1-km global land cover classification product (GLC2000) (Bartholome & Belward, 2005), the 1-km and 500-m Moderate Resolution Imaging Spectrometer (MODIS) land cover maps Tateishi et al., 2011), the 300-m global land cover map (GlobCover) derived from Medium Resolution Imaging Spectrometer (MERIS) dataset (Arino et al., 2012), the 300-m European Space Agency (ESA) Climate Change Initiative (CCI) land cover maps from 1992 to 2015 (UCL-Geomatics, 2017), the 100-m ESA Copernicus Global Land Service Land Cover Map (CGLS-LC100) (Buchhorn et al., 2020), and the 100-m global land cover fraction map (Masiliūnas et al., 2021); (ii) fine-resolution ones from 10 m to 30 m, e.g. the 30-m finer resolution observation and monitoring of global land cover (FROM-GLC30) (Gong et al., 2013), the 30-m global land cover data product (GlobeLand30) , the 30-m global land-cover product with fine classification system (GLC_FCS30) (Zhang et al., 2020a), the most recent 30-m intelligent mapping of global land cover (iMap World 1.0) (Liu et al., 2021), the 20-m ESA CCI Sentinel-2 prototype land cover map of Africa in 2016 (Lesiv et al., 2017), and the 10-m finer resolution observation and monitoring of global land cover (FROM-GLC10) (Gong et al., 2019); and (iii) high-resolution one less than 10 m, e.g. the recent 3-m national land cover map in China based on Planet Imagery (Dong et al., 2021).
Urban land use is the highest level of human modification on Earth (Gong et al., 2020a;Theobald et al., 2020). Despite their small proportion (<1%) covering the land surface (Schneider, Friedl, & Potere, 2010), urban areas have accommodated more than 50% of the global population (Nations, 2018) and exerted a disproportionate influence on their surroundings in terms of mass, energy, and resource fluxes (Cao, Chen, Imura, & Higashi, 2009). In addition to the global LULC maps that outline the general distribution and pattern of urban extents, a group of global thematic products regarding impervious areas have been developed (Figure 1a), including Global Human Settlement Layer (Gong & Howarth, 1990Pesaresi et al., 2013), Human Built-up and Settlement Extent (Wang, Huang, Brown De Colstoun, Tilton, & Tan, 2017), Global Man-made Impervious Surface (Brown De Colstoun et al., 2017), Global Artificial Impervious Area , and other global impervious maps using Normalized Urban Areas Composite Index (NUACI) (Liu et al., 2018) and multi-source, multi-temporal random forest classification (MSMT_RF) . The above-mentioned remote sensing based impervious area products are binary classification by grouping all impervious surface areas into one class. However, maps contain knowledge on the distribution, pattern, and composition of different land-use types in urban areas, are still limited by the data available from government agencies and the spatial coverage at city and regional levels (Gong et al., 2020a). Moreover, the mapping standard on data sources, methods, land use classification units and schemes, and updating frequency varies across different cities and regions, due to extreme difficulties in (i) differentiating complex urban built-up areas to functional labels; (ii) coordinating financial input and skills of mapping personnel (Gong et al., 2020a); and (iii) securing the availability of spatially and temporally explicit datasets with high to very high resolutions.
Detailed urban land use information that reflects socio-economic functions and human activities is critically important for urban planning, landscape design, environmental management, disaster control, human health promotion, and biodiversity conservation (Chen et al., 2021b). To better inform high-level semantic information of urban components and uncover the spatially explicit distribution of various land use types, many previous studies have used multi-source and multi-scale remote sensing datasets and different classification algorithms to delve into urban land use classification ( Figure 1a). Here we categorize them into three classes by the minimum size of their spatial representation as pixel, object, and parcel (Chen et al., 2021b). The pixel-based approaches mainly use spectral, texture, and auxiliary features to differentiate land-use types in direct classifications. For example, Gong, Marceau, and Howarth (1992) integrated spatial and spectral features from Système Pour l'Observation de laTerre (SPOT) HRV multispectral data to classify two-level land uses at the rural-urban fringe of Metropolitan Toronto using a Gaussian maximum likelihood classifier. Since urban areas commonly comprise a complex spatial assemble of spectrally distinct land-cover types, per-pixel classification algorithms will not be suitable to quantify urban land uses without very high spatial resolution satellite sensors. Barnsley and Barr (1996) proposed to classify the image into broad land-cover types at the first stage and then group the classified pixels into different land-use categories based on their frequency and spatial arrangement within a square kernel at the second stage. However, these pioneering studies are largely limited to local scale experimental tests due to restrictions in data availability. Leveraging the medium spatial resolution Landsat dataset, Lu and Weng (2006) applied the spectral mixture analysis in classifying residential areas with different intensities and the integrated class of commercial, industrial, and transportation land uses. However, the classification scheme is too general without differentiating specific commercial, industrial, or transportation classes. Nevertheless, the complexity of the urban system makes it difficult for single-type remotely sensed data to identify different land use types and functional zones. A growing trend of recent studies is the integration of multi-source and multi-platform datasets to address this challenge. For example, Man, Dong, and Guo (2015a) aimed to derive pixel-based urban land-use classification with 15 classes by fusing structural and height information from Lidar and spectral and texture information of hyperspectral imagery. Theobald (2014) combined an expansive set of features from housing and employment census, infrastructure, and National Land Cover Dataset (NLCD) to produce a 30-m map of land use for the conterminous United States, including 79 land use types within five major categories: built-up, production, recreation, conservation, and water.
The new and improved satellite sensors that provide high-resolution (HR) and veryhigh-resolution (VHR) remote sensing imagery (e.g. IKNONS, Worldview, Quick Bird, etc.) have opened up new opportunities for detailed mapping and analysis of urban land use. It is now possible to identify the geometry, texture, size, location, and adjacent information of ground objects at a much finer scale Zhong et al., 2020). Objectbased approaches are becoming more popular and widely used to classify urban land-use types with two general stages, including the object segmentation from VHR remote sensing imagery and object-based land use classification. Image segmentation techniques aim at grouping pixels into homogeneous objects with consistent composition and continuous pattern, for instance, in spectrums, textures, and shapes. They generate a complete set of individual segmented "land-cover objects" that can be a tree, building, water body, road, etc. The subsequent stage is to use the classifier to identify their specific land-use types accurately. Generally, two types of information are usually included for consolidating the classification models, i.e. intra-object features (e.g. spectral, texture, and gradient features) and inter-object features (e.g. connectivity, contiguity, and adjacent alignment) .
In addition to statistical machine learning algorithms such as support vector machine (SVM), random forest (RF), and neural network (NN) that digest low-level features for urban land use classification through the regular training and prediction protocol, recent advances in deep learning based convolutional neural networks (CNNs) make it more promising and cost-effective to transform the raw images and low-level features to classes at a higher and slightly abstract level (LeCun, Bengio, & Hinton, 2015;Schmidhuber, 2015). These advances have greatly facilitated a wide range of applications, including image segmentation, object recognition, and land-use classification. For example,  proposed an object-based CNN (OCNN) method for urban land use classification using VHR images. Unlike the pixel-based convolutional process, the OCNN used the segmented objects as its functional units. The CNN structures are designed to assign semantic labels by partitioning within-object and between-object variations. Considering the challenges in understanding high-level semantics and spatial information through traditional CNN models in feature extraction, Bao et al. (2020) proposed the Deeper-Feature Convolutional Neural Network (DFCNN) to extract multidimensional and deeper features for classifying semantic labels. Although deep convolutional neural network (DCNN) techniques have achieved very impressive image classification performance, the potential of including multi-spectral remote sensing images rather than the true-color three channels is limitedly explored. Moreover, the traditional practice of DCNN methods uniformly crop large images into small processing units, thus leading to obvious block effects in the spatial pattern of the derived land-use classification. Huang, Zhao, and Song (2018) further proposed a semi-transfer deep convolutional neural network (STDCNN) approach including three steps of a transfer DCNN model, a small DCNN model with multispectral image layers, and a fully connected layer that fuses the first two parts, which makes it feasible to derive accurate object-based urban land use classification as well as maintaining the integrity of landuse patterns.
The hierarchy of urban scenes represents the urban landscape as "land-use parcel", "land-use objects", and "land-cover pixels" (Figure 1b). Intuitively, land-use parcels are comprised of land-use objects, and land-use objects are grouped by land-cover pixels. The object-based approaches, but the segmented unit will be largely influenced by the spatial scale effect (Myint, Gober, Brazel, Grossman-Clarke, & Weng, 2011), in particular, driven by global models. This raised issue may hinder the consistency of intrinsic unit size across different landscape heterogeneities (Chen et al., 2021b). Moreover, the segmented unit is more spatially oriented, which cannot be flexibly applied in realworld practices of urban land use planning and management considering the potential "application gap" (Zhong et al., 2020). Addressing this shortcoming, recent studies proposed to use a more synthetic scale of parcels to classify urban land-use types. For example, a street block representing a relatively homogeneous urban function is more compatible with the basic unit for urban planning and land management (Hu, Yang, Li, & Gong, 2016;Liu & Long, 2016). The parcel-based approaches have been increasingly developed and applied in multi-scale urban land use classification (Blaschke, Hay, Weng, & Resch, 2011). In addition to remote sensing observations, social sensing big data recorded or collected in a crowdsourcing way has provided us new opportunities to quantify human activities and characterize urban structures Hu et al., 2016). For example, Point of Interests (POIs) extracted from social media check-in records can represent places or locations with certain functions (Chen, Song, Huang, & Xu, 2020). People usually go to different POIs for different purposes, e.g. dining, shopping, working, and entertaining . Based on Point of Interests (POIs) statistics allocated within street blocks, Liu and Long (2016) estimated the parcel-based land uses for 297 cities in China. However, the quality and quantity of POIs will be certainly biased across different locations. Several follow-up studies further explored more features from medium-resolution remote sensing imagery, POIs, and other auxiliary geospatial information in parcel-based land use classification at city levels and reported much better classification performance (Su et al., 2020;Yao et al., 2017b). Zhong et al. (2020) proposed a data-driven point, line, and polygon semantic object mapping framework to integrate POIs, OpenStreetMap (OSM) data, and VHR Google Earth imagery for parcel-based urban land use classification in four cities in China. However, the high-resolution spatial details from Google Earth imagery are always spatially mosaicked using temporally irregular VHR satellite observations, which prevents spatially and temporally consistency for classifying land use types (Chen et al., 2021b). Moreover, this work is leveraged at the stage of experimental tests in specific regions without extending to larger spatial scales. Noticeably, Gong et al. (2020a) proposed a new concept of essential urban land use categories (EULUC) to define the urban land use classification scheme and reported a new map of EULUC for entire China (EULUC-China) in 2018 that uses 10-m Sentinel-2 imagery, nighttime lights, POIs, and Tencent location-based service (LBS) data. It marks the beginning of a new approach of collaborative urban land use mapping that can be replicable and transferable across multiple spatial scales. Due to the incomplete coverage of OSM road networks in China, the segmented land parcels with certain big sizes are typically mixed with different land uses. This situation will be particularly serious for less urbanized cities. Addressing this challenge, Su et al. (2020) used the detailed road networks from a special road survey to generate land parcels, which represent the most accurate land parcel divisions. By further leveraging the complete set of surveybased land use samples in Shenzhen, they investigated the impact of training and validation samples on the accuracy in terms of different sizes and purities. Another promising approach is to integrate the object-based and parcel-based classifications, for example, Tu, Chen, Zhang, and Xu (2020) proposed a segmentation-based framework to complement the road network derived land parcels and further improve the mapping results of EULUC at the city scale. On the other hand, the exploration of feeding multisource datasets and multi-dimensional features into urban land use classification continues to be an open topic. For example, Zong et al. (2020) integrated road network datasets of Gaode and OpenStreetMap (OSM) to divide urban parcels and integrated Sentinel-2A images, Sentinel-1A polarization data, night light data, point of interest (POI) data, and other data to derive final land use classifications. Huang, Yang, Li, and Wen (2021) fused the multi-view optical imagery and high-resolution nighttime light imagery to derive urban land use function for two megacities of Wuhan and Beijing in China.
The structure of this review is organized as below. We first summarize the evolution of EULUC mappings in Introduction. We review the advances of EULUC mapping research and practices regarding their data, methods, and applications in Section 2. Based on the historical retrospect, we conclude the challenges and limitations of current EULUC studies in Section 3. Finally, we discuss future opportunities to better facilitate multi-scale EULUC mapping in Section 4.

An overview of data, methods, and applications
New and improved data and methods have been continuously involved in urban land use classification, which witnessed an accelerated pace of innovations and practices in this field due to the rapid development of remote sensing and artificial intelligence over the recent decades. Here we dive in from three primary perspectives to provide an overview of data, methods, and applications that have been progressively contributed to the knowledge of urban land use classification.

Multispectral remote sensing
Multispectral remote sensing provides image information in the spectral domain as well as in the spatial domain. Ideally, different land cover objects and land use parcels represent distinct spectral responses recorded from the multispectral remote sensing observations. This is the fundamental assumption that we take advantage of the individual spectral band and the combination of spectral bands to separate different land-use types. Many remote sensing indicators have been devised to enhance thematic LULC features by diagnosing and analyzing spectral profiles. For example, we now have representative remote sensing indicators of (i) vegetation: Normalized Difference Vegetation Index (NDVI) (Tucker, 1979), Enhanced Vegetation Index (EVI) (Huete et al., 2002), etc.; (ii) soil: Normalized Difference Soil Index (NDSI) (Rogers & Kearney, 2004), Ratio Normalized Difference Built-up Index (RNDBI) (Deng, Wu, Li, & Chen, 2015), etc.; (iii) water: Normalized Difference Water Index (NDWI) (Gao, 1996;Xu, 2006), and (iv) built-up area: Normalized Difference Built-up Index (NDBI) (Zha, Gao, & Ni, 2003), Index-based Built-up Index (IBI) (Xu, 2008), Normalized Urban Areas Composite Index (NUACI) (Liu et al., 2018), etc. However, the spatial details vary a lot across different multispectral remote sensing images. The coarse spatial resolution images with large pixel sizes (e.g. MODIS with 250-1000 m and VIIRS with 500 m) are bound to have multiple land cover categories within a single pixel, especially for urbanized areas. The mixed land use problems can be eliminated dramatically at the scale of medium spatial resolution remote sensing images such as Landsat with 30 m and Sentinel-2 with 10-60 m. However, the capabilities for extracting the geometry, texture and adjacent information within and between objects are still limited. In contrast, HR and VHR images such as SPOT, IKONOS, Quick Bird, and Worldview definitely provide more detailed spatial information of the land surface, making it feasible to quantify distinct spatial/spectral/geometric characteristics for different land use types. However, their high expanses barricade the data accessibility for largescale applications (Chen, Huang, & Xu, 2017b).

Hyperspectral remote sensing
Hyperspectral imaging has continuous narrow wavelength bands (around 10 nm per band). It is able to capture much more variations of spectral signatures than the multispectral ones (about 100 nm per band) (Xu & Gong, 2008). It has been demonstrated that the advantage of spectral sensitivity can improve the accuracy in discriminating similar land-use types (Zhang & Ma, 2009). For example, EO-1 Hyperion hyperspectral images were used for LULC mapping (Xu & Gong, 2007). The Chinese HJ Hyperspectral Imager (HSI) with 110 bands at 100 m was fused with 30-m multispectral images to derive detailed wetland species mapping (Chen, Chen, Lu, & Xu, 2017a). However, the spatial coverage of such data is limited for large-scale applications.

Multi-angular remote sensing
Multi-angular imaging is designed to provide remote sensing observations from different views. Previous studies have suggested bidirectional reflectance distribution function (BRDF) information can be used to complement spectral signatures to improve land cover classification accuracy (De Colstoun & Walthall, 2006). Following this direction, Huang, Zhang, and Yu (2012) proposed to fuse multi-angle observation from Multiangle Imaging SpectroRadiometer (MISR) data with Landsat observations to improve urban land cover mapping. Chen et al. (2017b) blended the Landsat data and the retrieved multi-angular MODIS observations from the RossThick-LiSparse-Reciprocal (RTLSR) kernel-driven BRDF model (Lucht, Schaaf, & Strahler, 2000) to consolidate urban land cover classification in Beijing. However, these multi-view observations are still limited by their spatial resolution less than or equivalent to 30 m, which makes it challenging to quantify urban structures at fine scales, especially for those dense and high-rise cities. The Chinese launched Ziyuan3-01 (ZY3-01) satellite can simultaneously acquire multi-view images in nadir, forward, and backward modes, but its spatial resolution is much higher at 2.1 m. Recently, Huang et al. (2021) integrated the multispectral and multi-view ZY3-01 images to capture urban vertical information, which can be helpful and cost-effective in urban land use classification.

Nighttime light remote sensing
As a measure of the lighting brightness observed at night from space, remote sensing based nighttime light (NTL) observations have emerged to be a unique measure of human activities and socio-economic attributes at different spatial and temporal scales (Elvidge, Baugh, Kihn, Kroehl, & Davis, 1997;Li & Zhou, 2017). For example, the long-term NTL products (1992-2013) from the Defense Meteorological Satellite Program's Operational Line-scan System (DMSP/OLS) have triggered a wide range of applications such as urbanization (Ma, Zhou, Pei, Haynie, & Fan, 2012;Song, Chen, & Kwan, 2020), socioeconomic development (Xie & Weng, 2016;Yu et al., 2015), LULC classification (Cao et al., 2009;Zhang & Seto, 2011), and evaluation of light pollution (Chalkias, Petrakis, Psiloglou, & Lianou, 2006) and conflict crisis . The Visible Infrared Imaging Radiometer Suite (VIIRS) instrument onboard the Suomi National Polar Partnership (NPP) satellite launched in 2011 was equipped with a Day/Night Band (DNB) to provide highquality night observations (Elvidge, Baugh, Zhizhin, Hsu, & Ghosh, 2017). Launched on June 2, 2018, by China, Luojia-1 has delivered dramatic improvements over the above two predecessors, in terms of increased spatial resolution (~130 m), high radiometric quantization (14 bits) with a swath of 250 km (Li, Zhao, Li, & Xu, 2018). Some recent studies have taken advantage of the finer-resolution nighttime information from Luojia-1 to map urban land use classification in China (Gong et al., 2020a;Tu et al., 2020). However, these existing nighttime light observations are still at relatively coarse and medium spatial resolutions from 130 m to 1000 m, preventing characterizing between-and within-object features in urban areas. Given the advantage of nighttime light observations in characterizing human activities and socioeconomics, some recent satellite initiatives and missions have delved into the exploration of fine-scale nighttime light sensors. For example, the Jilin1-07 satellite can provide high-resolution nighttime light images at 0.92 m (Huang et al., 2021). All these efforts will potentially contribute to better measures and approaches for depicting fine-scale urban structures and functions.

Microwave remote sensing
Although less often applied than optical remote sensing, microwave-based imaging has already become popular in characterizing urban structures and features, particularly for the emerging concept of multi-dimensional urban studies (Gamba, Aldrighi, & Stasolla, 2010;Li, Zhou, Gong, Seto, & Clinton, 2020b;Pelizari et al., 2018). Over the past few decades, we have witnessed a growing number of HR and VHR synthetic aperture radar (SAR) sensors, including Envisat ASAR, ALOS PALSAR, RADASAT-1, TerraSAR-X, Cosmo/ Skymed, and Sentinel-1. Additionally, the superiority of microwave-based remote sensing in penetrating cloud cover, eliminating weather condition impacts, and capturing texture and height information has further accelerated its broader application in urban studies . For example, the height information of building structure is one of the important explanatory variables accounting for different land-use types. It has been included as a feature in the EULUC classification (Chen et al., 2021b;Tu et al., 2020).

Lidar remote sensing
As an emerging new type of active remote sensing, Light detection and ranging (Lidar) is a method for determining variable distance by targeting an object with a laser and measuring the time for the reflected light back to the receiver (Liu, 2008). Unlike optical data, Lidar data has the natural advantage of capturing high-accuracy structural information that can be used to differentiate land-use types with other structures, components, and compositions. For example, Antonarakis, Richards, and Brasington (2008) used airborne Lidar data to map land-cover classification based on object-based classification methods. Zhou (2013) investigated the integrated use of Lidar height and intensity data for urban land-cover classification and found that Lidar-derived parameters such as height, height variation, intensity, height texture, and skewness or kurtosis can be beneficial in land-use classification. However, the application of Lidar data for land-use classification is limited by spatial coverage, data availability and observational costs, which makes it very difficult to extend to large-scale applications with fine-resolution Lidar footprints. Additionally, updating temporal frequency will be another concern because of the limited availability of airborne Lidar data collected by commercial companies and government agencies.

Emerging new types of remote sensing datasets
The rapidly evolving CubeSat society has opened a new era for Earth observations (Poghosyan & Golkar, 2017;Puig-Suari, Turner, & Ahlgren, 2001). For example, as one of the most representative companies, Planet has launched a series of satellites, including RapidEye, PlanetScope, and Skysat (Cooley, Smith, Stepan, & Mascaro, 2017), with the aims to realize the spatial, temporal, and spectral observations with simultaneous fine resolutions through this ambitious satellite network. Street-view imagery from Google, Baidu, and Tencent provides street-level or eye-level observations along the road networks, which has been widely used to serve as the ground reference of urban land uses or the on-site lens to quantify the urban environment (Cao et al., 2018;Yu, Zhao, Chang, Yuan, & Heng, 2019;Zhang et al., 2017a). Unmanned aerial vehicle (UAV) sensors and platforms nowadays have been extensively used in urban studies to deliver high-resolution remote sensing observations (Akar, 2018;Al-Najjar et al., 2019;Li, Levin, Xie, & Li, 2020a;Zhang, Chen, Wang, Wang, & Dai, 2017b).

Social sensing big data
Although remote sensing observations from multiple types, platforms, and scales have greatly supported the mapping, monitoring, and modeling of the dynamic urbanization and land-use changes, existing remote sensing based approaches have one major shortcoming that they mainly study the physical (e.g. spectrum, texture, geometry, and land surface temperature) and indirect anthropogenic (e.g. nighttime light) difference between different land-use classes . The emerging social sensing big data have provided novel insights, datasets, and approaches to quantify spatiotemporal patterns of human activities and uncover thematic information in urban contexts ( Figure 2).

Social media data
Social media is a set of Internet-based applications grounded by the idea of Web 2.0 to empower contents and applications that can be continuously modified and altered by users in participative and collaborative ways (Gilbert & Karahalios, 2009;Kaplan & Haenlein, 2010). The development of Mobile Internet technologies further connects human beings through social networks in very diverse aspects. As a result, we now witnessed many geotagged information from Twitter, Facebook, Instagram, LinkedIn, Flickers, WeChat, Weibo, etc. For example, Twitter is one of the most popular social media platforms globally, which allows users to post messages and record real-time positions. In this way, geotagged Tweets' spatial and temporal features are considered a good measure of dynamic human movements. Several previous studies have analyzed the changes in the number of Tweets sent from different geographic locations to characterize urban land uses (Abbasi, Rashidi, Maghrebi, & Waller, 2015;Frias-Martinez & Frias-Martinez, 2014;Wakamiya, Lee, & Sumiya, 2011).

Mobile device data
Mobile devices such as mobile phones carried by persons can record the relative locations of their daily movements to the access points (e.g. cellular towers or WiFi access points). Based on the locations of those access points and their accessible radius, we can infer the spatiotemporal trajectories of human mobility. For example, Pei et al. (2014) constructed a synthesized vector of mobile phone activity to identify land-use types using mobile phone data and a semi-supervised clustering method. Louail et al. (2014) used the recorded phone data to define an urban dilatation index for measuring spatial structure in 31 Spanish cities. Instead of logging the relative locations to access points, individual human mobility can be recorded by GPS mobile devices. This is usually the best dataset to quantify human mobility with the highest spatial and temporal accuracies. For example, the pick-up/set-down dynamics from taxi GPS traces were used to recognize the function of urban land uses (Liu, Wang, Xiao, & Gao, 2012;Pan, Qi, Wu, Zhang, & Li, 2012). However, given the concern about privacy and cost issues, the collection of GPS-based trajectory dataset is always limited to local regions and short-term sampling periods.

Volunteered geographic information
Volunteered geographic information (VGI) is an emerging phenomenon and trend of crowdsourcing in which members of the general public create, collect, and contribute georeferenced facts about the Earth's surface and near-surface to synthesized databases (Goodchild, 2007;Goodchild & Li, 2012). For example, initiated in 2004 as a volunteer effort, OpenStreetMap (OSM) has become the best-known and most successful VGI database with substantial global spatial information that maps various points, lines, and polygon features (www.openstreetmap.org). The rich geographic information from OSM has been widely employed to classify land-use types and uncover urban land-use patterns (Gong et al., 2020a;Hagenauer & Helbich, 2012;Hu et al., 2016;Tu et al., 2020;Zhong et al., 2020;Zong et al., 2020).

Auxiliary data sources
In addition to the remote sensing and social sensing data sources, many auxiliary datasets have also been used in EULUC mappings. For example, census data that includes demography, education, employment, income, and housing can reveal the spatial difference of socioeconomic status across different land use types (Theobald, 2014). Topographic features such as elevation, slope and aspect can be used in urban land use classification . Given the advantage of more complete population coverage and longer temporal spans, municipal data such as water consumption data has been used to identify the socioeconomic functions of urban lands (Pan et al., 2020) and analyze mixed patterns of land use (Guan, Cheng, Pan, Yao, & Zeng, 2021), because the land use types and composition can be estimated by the classification of spatiotemporal patterns in water consumption from individual end-users (Guan et al., 2021).

Methods
Although many advances have been made in data and method development for urban land use classification, fine-resolution and accurate EULUC mapping remain a challenge, given that many factors such as landscape heterogeneity, data processing, classification algorithms, mixed land use, and post-classification process may affect the success of a complete classification Gong et al., 2013). Therefore, the design of classification strategy is critically important to the performance of classification outcomes. Generally, the classification strategies can be divided into supervised, unsupervised, and hybrid classifications. The classification may seek to group land use types by their relative spectral similarity (unsupervised) or allocate land-use types based on their similarity to a set of predefined classes that have been characterized spectrally (supervised) (Foody, 2002). Nevertheless, the hybrid classification aims to combine the advantages of both supervised and unsupervised ones, which has been demonstrated that it can produce more accurate classification than that from the supervised and unsupervised classification alone (Lo & Choi, 2004;Malinverni et al., 2011). Moreover, the integration of multi-source data features and multi-function models has been another mainstream in promoting classification strategies for better classification performance (Gong et al., 2020a(Gong et al., , 2019. The basic unit of land use classification is the fundamental attribute for land use classification maps and products, which determines the finest spatial scale that end users can dive into (Chen et al., 2021b). Based on the minimum size of spatial representation, the classification unit can be generally categorized into three levels of pixel, object, and parcel. Therefore, the choice of classification unit should be determined after designating the classification strategy. Research progress on the division of different unit-based EULUC classification has been described in the Introduction. Here we mainly summarized the progress of classification algorithms that have evolved over the past few decades in machine learning, natural language processing (NPL), and deep learning based approaches.
With the development of data and method in the field of land use science, we have witnessed many classification efforts and practices at different scales using machine learning algorithms, including Maximum likelihood (ML) Man, Dong, & Guo, 2015b;Özkan & Sunar Erbek, 2005;Paola & Schowengerdt, 1995), K-means (Chen et al., 2017d;Wang et al., 2016), Neural Network (NN) (Pacifici, Chini, & Emery, 2009;Paola & Schowengerdt, 1995), Support Vector Machine (SVM) (Man et al., 2015b), Decision Tree (DT) (Hu & Wang, 2013), Random Forest (RF) (Gong et al., 2020a;Grippa et al., 2018;Ruiz Hernandez & Shi, 2018;Sun et al., 2020;Zhang et al., 2017cZhang et al., , 2017a, etc. Among these algorithms, RF that consists of a large ensemble of regression trees has proven its efficiency and robustness in computational cost and model performance when processing high dimensional data, which has been extensively used. Similar to the concept of understanding the contents of natural language data, NLPbased approaches have been increasingly used to fuse the geospatial latent semantic information extracted from remote sensing observations and multisource social sensing information as patterns for urban land use classification. For example, the Google Word2Vec model (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) was adopted to measure the contextual relationships between POIs and urban land-use types . Some of the Word2Vec-based extension models, such as Traj2Vec (Zhang et al., 2021) and Place2Vec (Zhai et al., 2019) were subsequently developed to better retrieve the potential semantic information of locations and trajectories to characterize urban land-use types. A variety of probabilistic topic models in the NLP field such as latent Dirichlet allocation (LDA), latent semantic analysis (LSA), probabilistic latent semantic analysis (pLSA), and term frequency-inverse document frequency (TF-IDF) have been used for different semantic analysis of urban land use classification Yan, Schultz, & Zipf, 2019;Yao et al., 2017a;Yuan, Zheng, & Xie, 2012;Zhang et al., 2021).
With the growing popularity of CNN models, deep learning based approaches have also rapidly developed in the field of EULUC classification because of their capability of transforming the raw input imagery and low-level features to classes at a higher and abstract level. For example, the conventional CNN models have been employed to derive multi-intention land use classification maps at different spatial scales (Cao et al., 2018;Liu, Qi, Li, & Yeh, 2019;Liu & Shi, 2020;Srivastava, Vargas-Munoz, & Tuia, 2019). To better integrate spatial and temporal features from adjacent contexts, some deeper CNN models have been proposed to refine EULUC classification (Bao et al., 2020;Feng, Zhu, Yang, & Li, 2019;Huang et al., 2018). Additionally, social sensing data and methods have also been integrated with deep learning approaches to derive more accurate EULUC classification (Du, Du, Liu, Zhang, & Zheng, 2020;Jia et al., 2018;Zhong et al., 2020).

Applications and implications
Built environment factors have become a significant focus on the field of physical activity and public health, which typically consists of urban design (arrangement of physical elements within a city), land use patterns (the distribution and composition of land functions across space), and transportation system infrastructure (roads, railway, sidewalks, etc.) (Handy, Boarnet, Ewing, & Killingsworth, 2002). Therefore, land-use maps outlining the distribution, pattern, and composition of essential urban land use categories (EULUC) have triggered a wide spectrum of applications and implications in the field of urban environmental health science.

Urban planning and land management
The fundamental data required by urban planners and policy makers is accurate information on current land use, how it has changed over the past, and how it will change in the future for practicing various urban planning and land management activities (Malarvizhi, Kumar, & Porchelvan, 2016). For example, over the past decades, the land use data collected from survey-based approaches have been the primary baseline for urban planners and researchers in preparation for the master plan, landscape, and functional design from a community to a town, a city, and the city cluster (Liu, Zhang, Zhang, & Borthwick, 2014;Malarvizhi et al., 2016). In recent decades, remote sensing data availability, especially for the VHR satellite imagery, has facilitated the spatiotemporal mapping and monitoring of urban land-use changes from local to regional scales (Sakieh et al., 2015;Treitz & Rogan, 2004).

Urban environment
The distribution of different land-use categories has been closely related to a variety of environmental problems such as air pollution, water pollution, noise pollution, heat island, and greenspace inequity. Therefore, EULUC maps and the derived land-use variables have been widely used in urban environmental studies. For example, the urban LULC changes and urban thermal landscape were integrated to account for the spatial pattern of local air pollution (Weng & Yang, 2006). Due to the high correlation between land use pattern and air pollution (Hoek et al., 2008), a group of land-use regression models has been developed to estimate spatial pattern and temporal variability of nitrogen dioxide concentrations (Beelen et al., 2013;Henderson, Beckerman, Jerrett, & Brauer, 2007) and fine particulate matter concentrations (Eeftens et al., 2012;Shi, Ren, Lau, & Ng, 2019). In addition to the choice of different regression models with linear and nonlinear forms, the quality of land-use baseline maps plays a critical role in modulating the performance of the air pollution retrieval. In a similar context, the influence of urban land use on land surface temperature and urban heat islands has been widely investigated (Dugord, Lauf, Schuster, & Kleinschmit, 2014;Heusinkveld, Steeneveld, Van Hove, Jacobs, & Holtslag, 2014;Jusuf, Wong, Hagen, Anggoro, & Hong, 2007;Stone & Norman, 2006). As for water pollution, it is recognized that strong ties exist between land uses and the water quality characteristics of adjacent aquatic systems in terms of spatial configuration (Bu, Meng, Zhang, & Wan, 2014;Lee, Hwang, Lee, Hwang, & Sung, 2009;Uriarte, Yackulic, Lim, & Arce-Nazario, 2011). Land use also significantly affects the distribution of urban green space regarding its configuration and composition (Chen, Nie, Chen, & Xu, 2017c;Li, Bai, Zhou, Han, & Han, 2015), thus leading to the spatial difference in greenspace exposure for urban citizens Song, Huang, Cai, & Chen, 2018).

Urban morphology change
The irregularity of land uses and land parcels that constitute the urban morphology (Batty & Longley, 1988) is mainly defined by the shape of land use and the density of different land uses filled in the spatial context (Mesev, Longley, Batty, & Xie, 1995). Due to the LULC changes, urban morphology is continuously evolving at micro-and macro-scales. Schneider and Woodcock (2008) used remote sensing data, pattern metrics, and census information to compare the urban growth in 25 global cities. In terms of their evolutionary morphologies, they aimed to categorize the urbanization process into compact, dispersed, fragmented, and extensive modes. Liu et al. (2010) proposed a landscape expansion index to generalize the spatio-temporal dynamics of landscape patterns. By applying this proposed index in measuring morphological changes of urban expansion, the urban growth types can be categorized into three major groups of infilling, edge-expansion, and outlying (Liu et al., 2010;Tu et al., 2021). As the outcome of urban development, urban morphology changes have been closely associated with demographic structures (van de Coevering & Schwanen, 2006), transportation modes (Zhou & Gao, 2020), building energy consumption (Wong et al., 2011), urban air quality (Yuan, Ng, & Norford, 2014), and heat environment (Zhou and Chen 2018).

Urbanization process from observation and simulation
Time-series LULC observations have greatly supported the historical reconstruction and future prediction of the urbanization process. Numerous maps and products that document spatially and temporally explicit information of LULC changes have advanced our understanding of the process of urbanization and their impacts on the environment and ecology from local to regional and global scales. For example, by quantifying the annual urban land-use changes, Seto and Fragkias (2005) used the derived spatiotemporal landscape metrics to infer underlying social, economic, and political processes that drive the observed urbanization difference.  mapped the annual artificial impervious area dynamics at a 30-m resolution from 1985 to 2018 using the full Landsat archive. The derived results revealed a considerable urbanization process at the global scale over the past three decades. Specifically, Asia has the most significant growth. China and the US are the top two nations of total artificial impervious areas accounting for around 50% of the global total in 2018 . From the perspective of simulations, a wide range of urban simulation models has been developed to predict urban land-use changes driven by natural and socioeconomic factors. Given the difference of simulation targets in socioeconomic activities, the spatial configuration of urban land uses, or modeling units/scales, these models can be categorized into three classes of the land use/transportation (LUT) model, the cellular automata (CA) model, and the agent-based (AB) model . Specifically, the LUT models are focused on mimicking socio-economic activities by considering relatively complex relationships among variables and sectors . The CA models are more focused on simulating specific land-use types in spatial allocation, which have been regarded as very useful tools to analyze the causes and effects of the urbanization process and have been extensively modified for addressing different urban simulation issues, for example, the constrained CA model (Li & Yeh, 2000;White, Uljee, & Engelen, 2012), logistic CA model (Wu, 2002), SLEUTH model (Silva & Clarke, 2002), LUSD model (He et al., 2005), and FLUS model (Liu et al., 2017b). The AB models, treated as an intermediary between these two objectives, have been extensively investigated and utilized due to their capability of modeling behaviors and processes through individual agents (Groeneveld et al., 2017;Matthews, Gilbert, Roach, Polhill, & Gotts, 2007;Parker, Manson, Janssen, Hoffmann, & Deadman, 2003). There are also a set of AB model systems such as Swarm, MASON, Repast, StarLogo, NetLogo, OBEUS, AgentSheets, and AnyLogic for geospatial simulations (Castle & Crooks, 2006). Although most of the existing studies from remote sensing analysis treat urban land uses as the combination of major types, including built-up land, cropland, vegetated land, and water body (Du & Huang, 2017;Guan et al., 2011;Schneider & Woodcock, 2008;Seto & Fragkias, 2005), EULUC changes within the built-up land that depict the micro-scale dynamics of urban land functions have aroused growing attention in urban science community (Banzhaf et al., 2017;Wang, Lin, Glendinning, & Xu, 2018;Yao et al., 2017b).

Human behaviors and public health
The distribution of urban land use types impacts human activities. A growing literature investigates the potential for causal links between land use patterns and travel behavior (Boarnet & Crane, 2001). For example, results revealed that the pattern of mixed land use had significant associations with the frequency of walking for transportation, and the area-correlated mixed land use was significantly related to the duration of walking for transportation (Duncan et al., 2010). A comparative study between Boston and Hong Kong further confirmed the role of land use in influencing travel behaviors, which was independent from travel time and monetary costs (Zhang, 2004). They revealed that land use strategies influenced travel more effectively when complemented by pricing policies (Zhang, 2004). Urbanization in the form of dramatic land-use changes and conversions has also led to a major public health challenge with the growing disease burden in urban areas (Gong et al., 2012), for example, asthma (Son, Kim, & Bell, 2015), infectious disease (Patz et al., 2004), obesity (Jia et al., 2021). As a proxy of urban physical contexts, EULUC maps provide multi-scale baseline maps for linking physical activity, urban environment and human behaviors, thus providing opportunities for informing environment and policy initiatives to promote healthy and sustainable urban development.

Sample collection
Training data are critical components to urban land use classification because different classifiers and machine learning algorithms highly depend on the input reference with varying qualities and quantities. Generally, the increase in the number and variability of training samples leads to robust urban land use classification (Gong et al., 2019;Su et al., 2020). However, sample collection is the most time-consuming step. Existing samples are mainly derived from the following approaches: (i) manual or semi-manual inspection, for example, on-site survey, or visual inspections of VHR Google Earth imagery, Google Street Views, Google POIs, and 3-D modeled imagery; (ii) open data portal with LULC labels, for example, OSM has included polygon-based land use labels; and (iii) land use reference from government agencies. Two major weaknesses are noticeable. First, the crowdsourcing protocol of interpreting land use labels is still labor-intensive. The number of samples collected highly depends on coordinated efforts from both participatory popularity and financial input. Moreover, the quality of samples collected cannot be systematically controlled due to the inconsistent sampling criteria caused by the subjectivity of sample collectors among different places. Second, the temporal consistency of available georeferenced samples is another concern. For example, the Google Earth imagery at regional scales is always spatially mosaicked using temporally irregular VHR satellite observations, preventing spatially and temporally consistent information for interpreting land-use types. The crowdsourcing effort such as OSM will be an integration of sample collections spanning different years. Although the land use references from government agencies can be regarded as the most accurate ground truth for selecting samples, they are always limited by the updating frequency without revealing the temporal dynamics.

Mixed land-use problem
The mixed land use has been a big challenge in mapping EULUC, especially for highly dense urbanized contexts. Mixed land uses describe urban aggregation and development that integrate multiple functional uses, including residential, commercial, educational, medical, recreational, or even industrial land use, in a small neighborhood area (e.g. land parcel) or in high-rise buildings. Consequently, mixed land uses attract a diversity of user groups throughout different times of the day (Zahnow, 2018). However, the differentiation from these mixed land uses will be quite difficult if the spatial resolution of input data sources and features is not equivalent to or much coarser than the targeted objects. Conventional approaches tend to adopt the protocol of "dominant land use" rather than "proportional land uses", eliminating a lot of useful information to describe the microstructure of land use composition. Recent studies from two major directions have been working on addressing this shortcoming. First, the fine-resolution land parcel generation with detailed road networks and multi-scale image segmentation. With the increase in road network density, the derived land parcels have much higher purity in land uses (Su et al., 2020;Sun et al., 2020). For areas with highly mixed land uses, the objects derived from image segmentations and the parcels derived from road network divisions are complemented to refine urban land use classification (Tu et al., 2020). Second, the emerging new datasets such as traffic data, mobile phone data, social media data, and municipal data (e.g. water and electrical consumption data) provide new insights for estimating mixed urban land use patterns. For example, Liu and Long (2016) computed the degree of mixed land use based on the proportional POI types within each land parcel. Dovey and Pafka (2017) proposed a live/work/visit triangle framework focusing on the interconnections between land use functions to measure land-use mix. Considering the biased spatiotemporal coverage of social sensing data, Guan et al. (2021) proposed to identify individual socioeconomic functions by the water consumption patterns of municipal services and measure the mixed land-use patterns using the information entropy index. However, the data availability of municipal services will barricade large-scale implementations and practices of this type of approach. Additionally, the vertical mixed land uses continue to be a challenge because of the limited approaches to partition different land-use types within building groups.

Generalization of data and models
The capability of data and model generalization across different spatial and temporal scales will be of great significance to multi-scale EULUC mappings. However, systematic testing of samples, features, and models on classification performance across different spatial and temporal contexts is limited. Knowledge about the impact of sample size, feature combinations, and model selection on classification performance is required to better support the practical implementation of multi-scale EULUC mapping (Gong et al., 2020a). Additionally, the transferability of data and models is another major concern. For example, the data availability, especially for some social sensing big data, will be largely different across countries, which prevents the flexible function of the mapping framework from one place to another. Although medium-resolution satellite imagery such as Landsat and Sentinel-2 are globally open access, some VHR remote sensing imagery is limited to specific regions and countries. For example, the National Agriculture Imagery Program (NAIP) imagery at a spatial resolution of 1 to 2 m can provide much higher spatial details for improving the EULUC classification performance (Chen et al., 2021b). Still, this dataset is only available in the United States. Similarly, different classification models have been employed in EULUC mapping practices, but their robustness to other feature inputs and the associated adaptation to different study areas remains uncertain.

Large-scale EULUC mapping
Given the advance of data and methods in the field of urban land use classification, existing EULUC research and practices are still limited to local to regional levels. To address pressing environmental challenges caused by urbanization, it is important to have urban land uses at the national and global scales that are derived from the same or consistent data sources with the same or compatible classification systems and mapping methods (Gong et al., 2020a). This is because only with urban land use maps produced with similar criteria, consistent environmental policies can be made, and action efforts can be compared and assessed for large-scale environmental contexts (Gong et al., 2020a). However, more detailed urban land use maps do not exist at the scope of global coverage.

Discussion and future opportunities
To improve the current EULUC mapping results and advance further research, we proposed that future efforts could be devoted to the following seven aspects.

Cross-scale mapping strategy
A cross-scale mapping strategy that can be compatible across pixel-, object-, and parcelscale is highly demanded. It can well solve the mixed land use problem because of (i) the logic reliability of scaling representation of urban land surface from pixels to objects and parcels; and (ii) the diverse outputs of urban land use classification maps that can generate the combination and proportion of mixed land uses, as well as the ratio of mixed land uses and dominant land use. Additionally, the cross-scale mapping strategy will remove potential obstacles for multi-disciplinary practical applications. For example, land use modelers may be interested in using pixel-based land use classification maps; urban planners and landscape architects may prefer parcel-based land use classification maps; and urban climate modelers may prefer object-based classification maps that can be accurately aggregated to flexible units of different land-use combinations.

Optimal integration of multi-source features
An increasing number of different data sources have been successfully used in urban land use classification. However, the higher volume of input data does not ensure better classification performance (Chen et al., , 2021b. Although a large group of explanatory variables is not necessarily an obstacle to the reliable function of machine learning based classifiers, the presence of highly correlated variables may nonetheless hinder model interpretations (Chen et al., 2021a). Recent land cover classification studies also revealed that the feature redundancy might even degrade the classification performance . Moreover, the contribution of different data features to the classification performance is not the universe, which will have distinct characteristics across different locations and different combinations of datasets. Therefore, the localized function of optimal features should be carefully tuned to achieve robust classification results. Additionally, information compression and feature transformation techniques such as principal component analysis (PCA) (Abdi and Williams, 2010) and maximum noise fraction (MNF) (Gao, Du, Zhang, Yang, & Wu, 2013) can be taken into consideration to better preserve useful information while reducing the computational costs. In the meantime, geometric registration is another issue when integrating multi-source geospatial data. Although the geometric mismatch does not impact much on block-level land use classification since all the inclusive features are aggregated into block-level statistics, the same issue will be challenging towards pixel-level urban land-use classification that geometric mismatch leads to be the main driver of misclassification over the adjacent areas. For this case, geometric registration and correction should be conducted before stacking inclusive features from different geospatial data layers.

Global sample library from crowdsourcing approaches
A global sample library that consists of different land-use types across different cities and countries will be highly needed for the research campaign of large-scale EULUC mapping. However, sample collection is time-consuming and labor-intensive. To promote a costeffective global sample library, we call for future efforts devoted to the following aspects. First, the collection of samples can be facilitated in a crowdsourcing approach, including the VGI-based OSM land use labels, self-organized sampling campaigns of on-site survey and visual inspection, and available land-use references from government agencies and related institutions. Second, the temporal consistency of collected samples can be improved by inter-calibration and change detection techniques. As for samples collected in earlier years, LULC change detection algorithms such as continuous change detection and classification (CCDC) (Zhu & Woodcock, 2014), Landsat-based detection of trends in disturbance and recovery (LandTrendr) (Kennedy, Yang, & Cohen, 2010), and breaks for additive season and trend (BFAST) (Verbesselt, Hyndman, Newnham, & Culvenor, 2010) can be used to include those stable ones to the sample library in more recent years. Third, the structure of the sample library should be systematically designed by including hierarchical land use categories, harmonizing the spatial distribution and sample size across different locations (e.g. continents, countries, cities, and climate zones), and considering the similarity and heterogeneity within and between classes.

Advanced machine learning and ensembled classification
Traditional machine learning algorithms such as SVMs and RF have been widely and successfully applied in EULUC mapping at different scales. However, this sort of model using low-level features has been difficult to adapt to the emerging big data era with tons of data sources and features, especially for the cost-effective and intelligent mapping of large-scale EULUC dynamics. The unique deep hierarchical structure of recent deep learning algorithms has proven its powerful capability of learning and generalization using deep convolutional neural networks, which is suitable for big data processing and analysis (LeCun et al., 2015). Two major opportunities from deep learning technologies can be explored in the field of large-scale EULUC mapping. First, transfer learning that makes use of pre-trained neural networks has proven to be quite helpful and efficient to solve a different but related problem (Pan & Yang, 2009). This concept is quite suitable to large-scale EULUC mapping. Because of the similarity and difference in landscape characteristics and land use compositions across regions and countries, the training samples collected in local sites may or may not be appropriate for EULUC classification in different regions. Similarly, it will be unrealistic to apply a global model in universal EULUC mapping for different regions. The idea here is to build categorized groups in the sense of "similar cities" that share similar characteristics in landscape and land use patterns. Within each group of cities, we probably may apply local features and models in classifying nonlocal EULUC. Moreover, how to calibrate nonlocal similarity among different regions and develop spatially adjusted transferring models will be another open topic in this direction of the model and feature transferability from local to regional and global scales. Second, ensemble learning is a promising strategy that can bring together a group of algorithms and models. The ensembled strategy can reduce the variance of predictions and result in predictions that are better than any single model.

Open portal for data visualization and sharing
Efforts should be devoted to promoting data usages and sharing through open portals for visualization and download. We believe such a data portal can facilitate the distribution of multi-scale EULUC data by providing an easy-to-access, searchable hub for a variety of endusers from different fields and applications. First, the online open portal makes it possible for broader users to dive into any areas of their interests. On the one hand, it could facilitate multi-scale data usages based on the requirement of different spatial extents. On the other hand, users can in turn provide feedbacks on data quality assessment and other data attributes. Second, the entire procedure of EULUC mapping from raw datasets, training samples, high-level features, classification methods, and final mapping products can be integrated through the open data portal. It will facilitate the replication and extension of EULUC mapping practices and promote the interpretability of EULUC mapping results.

Historical reconstruction and future prediction of EULUC changes
Many studies have worked in the pipelines of historical reconstruction and future prediction of the urban sprawl process (Gao & O'Neill, 2020;). Yet, limited research has delved into the internal dynamic of urbanized regions in terms of land use changes at fine spatial and temporal resolutions. The knowledge about the historical process and future development of EULUC will be insightful and supportive to city managers and relevant stakeholders for strategic urban planning on healthy and sustainable development, especially for those rapidly urbanized areas where land use is changing dramatically. Therefore, we call for potential efforts to develop a relatively long-term annual urban land use classification dataset that is temporally consistent and spatially explicit, which will be of great importance to various applications in the field of urban environmental science.

Implications in urban environmental studies
Intuitively, the EULUC maps that outline the composition, distribution, and pattern of varying land-use types represent the "identify cards" for different cities. In addition to the widely used fields of urban planning, landscape architecture, land management, and biodiversity conservation, EULUC maps and the derived spatiotemporal information could be very useful in urban environmental studies such as air pollution, heat stress, noise pollution, and green/blue space protection; economic geography studies such as urban vibrancy, innovation ecosystem, and housing market; urban energy applications such as anthropogenic heat, carbon footprint, and energy supplies. Moreover, the multitemporal mapping and monitoring of urban land-use changes and evolutions will significantly advance our understanding of the mechanism of urbanization in the evolutionary loop from emergence to development, transformation (growth or shrinkage), and stability. This knowledge can be leveraged in urban models for the historical reconstruction of the urbanization process and the future projection of urban development.
Additionally, another consciousness that should be addressed is to promote the philosophy of "from users, and back to users" in the field of urban science between the research and application communities. Only if we get fully understood the demands from endusers, the knowledge exchange gaps could be closed progressively.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Bin Chen is an assistant professor in the Division of Landscape Architecture, Faculty of Architecture, at the University of Hong Kong. His research interests include most aspects of geoscience and remote sensing, specifically multisource data-model fusion and geospatial big data analysis in the field of urban and environmental health science. He has published more than 40 journal articles, including Science, PNAS, Remote Sensing of Environment, ISPRS Journal of Photogrammetry and Remote Sensing, etc. He received the 2021 AAG Early Career Award in Remote Sensing, ISPRS Best Young Author Award, and Li Xiaowen Remote Sensing Excellent Youth Award.

Bing Xu is a professor in the Department of Earth System Science at Tsinghua
University. Her research interests include global environmental change monitoring and health, particularly the understanding of evolution and transmission mechanism of avian influenza at the molecular and cross-continental scales using phylogenetic analysis and spatio-temporal modeling tools. She is currently on the editorial board of Annals of GIS and ISPRS Journal of Photogrammetry and Remote Sensing. Peng Gong is a chair professor of global sustainability and Vice-President & Pro-Vice-Chancellor (Academic Development) of the University of Hong Kong. His research interests include urbanization and health, mapping and monitoring of global environmental change, and modelling of environmentally related infectious diseases. He led the creation of the world's first 10-m resolution global land cover map. He is an advisor to Future Earth and a member of the Earth Commission.

COI-statement statement
No potential conflict of interest was reported by the author(s).