Collaborative validation of GlobeLand30: Methodology and practices

ABSTRACT 30-m Global Land Cover (GLC) data products permit the detection of land cover changes at the scale of most human land activities, and are therefore used as fundamental information for sustainable development, environmental change studies, and many other societal benefit areas. In the past few years, increasing efforts have been devoted to the accuracy assessment of GlobeLand30 and other finer-resolution GLC data products. However, most of them were conducted either within a limited percentage of map sheets selected from a global scale or in some individual countries (areas), and there are still many areas where the uncertainty of 30-m resolution GLC data products remains to be validated and documented. In order to promote a comprehensive and collaborative validation of 30-m GLC data products, the GEO Global Land Cover Community Activity had organized a project from 2015 to 2017, to examine and explore its major problems, including the lack of international agreed validation guidelines and on-line tools for facilitating collaborative validation activities. With the joint effort of experts and users from 30 GEO member countries or participating organizations, a technical specification for 30-m GLC validation was developed based on the findings and experiences. An on-line validation tool, GLCVal, was developed by integrating land cover validation procedures with the service computing technologies. About 20 countries (regions) have completed the accuracy assessment of GlobeLand30 for their territories with the guidance of the technical specification and the support of GLCVal.


Introduction
Accurate characterization of Global Land Cover (GLC) is essential for sustainable development monitoring, environmental change studies, land resource management, and many other application areas (Foley et al. 2010;Roger and Pielke 2005;Running 2008;Grimm et al. 2008;Zell et al. 2012). In the past few years, a variety of operational or experimental approaches have been developed to produce global, regional and national land cover data products with different spatial-temporal resolutions and thematic accuracy (Loveland et al. 2000;Hansen et al. 2000;Friedl et al. 2010;Radoux et al. 2013;Latham et al. 2014;Ban and Jacob 2013;Gong et al. 2016). The thematic accuracy of these land cover data products can be documented through samplebased validation or comparison with existing reference datasets. The results will not only help the users to understand the uncertainty and the scope of data application, but also enable the producers to examine and analyze the types, sources and spatial distribution of errors (Strahler et al. 2008;Rwanga and Ndambuki 2017).
GlobeLand30 is a 30-m resolution global land cover (GLC) data product that was developed by the National Geomatics Center of China using a Pixel-Object-Knowledge (POK) approach , and its 2000 and 2010 versions were released in 2014, with 2020 version just released Chen, Liao, and Chen et al. 2014a, 2014b, 2017a, 2017b. Such finer resolution GLC datasets provide more details of land cover patterns, and permit the detection of land cover changes at the scale of most human land activities. GlobeLand30-2000 andGlobeLand30-2010 achieved an overall classification accuracy of over 80% at a global scale, as a result of third-party expert's assessment with a two-rank sampling strategy using 154,586 pixel samples . Moreover, many other experts or users validated GlobeLand30 for their own territories or interested areas through a sample-based validation or comparison with existing land cover or other datasets, resulting in an average accuracy of 80% for all classes or one single class (Brovelli et al. 2015;Manakos et al. 2015;Yang et al. 2017). However, these accuracy assessments were conducted either within 10% map sheets selected from a global scale or in some individual countries (regions). There are still areas where the uncertainty of GlobeLand30 and other finerresolution resolution GLC data products remain to be validated and documented.
In order to promote a comprehensive validation of GlobeLand30 and other finer-resolution GLC datasets, an international cooperative project was launched by the Global Land Cover Community Activity of the Group of Earth Observation (GEO) in 2014. This three-year project had the following three main objectives. The first objective was to develop a technical specification with appropriate methods for the validation of 30-m GLC datasets at a global scale. The second objective was to develop advanced tools for facilitating the collaborative validation processing. The third objective was to mobilize GEO members and participating organizations to join the cooperative validation by integrating their validation expertise and resources. This initiative was supported by the International Cooperation Program of the Ministry of Science and Technology of China, and about 30 GEO member countries and participating organizations joined this validation project. This paper presents the major considerations and preliminary results of this GEO-led validation project. Section 2 analyzes the major problems facing, including the lack of internationally agreed technical specification for 30-m GLC validation and an online tool for supporting collaborative validation activities. Section 3 presents the major technical considerations and development for formulating such a technical specification. The design and development of an online validation system, GLCVal, are then briefly described in Section 4. The collaborative validation practices are discussed in Section 5 with two case studies, followed by some concluding remarks.

Challenges facing global-scale validation
Land cover validation is a technical process where people select appropriate samples, at which reference data are collected and used to estimate the overall and class-specific accuracies of the given data product (Olofsson et al. 2014;Chen et al. 2016). As far as the sampling design is concerned, a number of approaches have been developed to determine the sample size and spatial distribution of samples, according to several fundamental criteria, such as probability, cost effectiveness and spatial balance. The stratified random sampling and two-stage cluster sampling are two commonly used approaches in GLC validation practices (Stehman and Czaplewski 1998;Stehman et al. 2012;Tong et al. 2011). Moreover, a few desktop and internet-based software tools have been developed to support the collection of reference data (Fritz et al. 2009;Clark et al. 2011). However, these sampling approaches and supporting tools could not satisfy the collaborative validation of 30-m resolution GLC products yet.
First, the 30-m resolution GLC data products present a strong spatial heterogeneity of land cover over the globe. The traditional sampling approaches cannot efficiently handle this spatial heterogeneity and may not produce credible spatial samplings. Typical problems include inappropriate sample sizes for target regions, under-represented sample numbers for rare classes, and irrational sample distribution in the geographical space (Chen et al. 2016). It is therefore necessary to take the spatial heterogeneity of land cover into consideration and to allow higher sample densities or larger sample sizes for more heterogeneous regions (Chen et al. 2016;Defourny et al. 2009;Congalton 1991). A Landscape Shape Index (LSI)-based approach was proposed to improve the quality of spatial sampling design for 30-m GLC validation, and its basic idea is to characterize the spatial heterogeneity with three-level LSIs for determining the subsequent sample sizes and their spatial distributions (Chen et al. 2016). Experimental tests showed that this LSI approach obtained more appropriate sample size for each region, sufficient sample numbers for rare classes, and optimal sample distributions in geographical space. The LSI-based sampling approach was therefore recommended to the GEO-led 30-m GLC validation project to solve the three problems associated with traditional sampling approaches.
Second, the collection of sample data over a large area is labor-intensive, costly and time-consuming. It is becoming even more difficult for GLC validation over the whole globe. With the rapid development of the internet technology, some web-based systems have been developed to enable people from anywhere in the world online access of reference data and other sample information (Stehman et al. 2012;Fritz et al. 2012;Han et al. 2015). The information includes very highresolution satellite imagery, geo-tagged photos, and thematic maps. However, most of them are dedicated to some particular steps of validation, not supporting the entire procedure of validation. For instance, Geo-Wiki does help the image interpretation of reference data, but cannot support sample size calculation, automatic sample layout, precision calculations and other functions. It is therefore necessary to develop an online validation system (or tool) (See et al. 2015). The major functions will include the integration and access of reference materials, estimation of total sample sizes and inter-class distribution, geospatial layout of samples, expert interaction checking, generating accuracy reports, online marking of errors, uploading and downloading of samples, and so on. It will help the experts and volunteers from different parts of the world to find and use reference information, upload sample information, make sample judgment and labeling, and annotate error messages.
Third, conducting a comprehensive accuracy assessment of finer-resolution GLC data products has now become a necessity for a number of reasons. One is the increasing demands from sustainable development monitoring, environmental change studies and many other application areas, where the accuracy assessment reports of large area or even global scale are requested. The other is the number of people working in the area of land cover validation (such as users and researchers) has increased rapidly and is forming a large voluntary validation community. They share common interests and have their own validation resources. Moving from individual validation practices to a GEO-led collaborative validation task force is now possible.
In order to promote and organize a GEO-led 30-m GLC validation, a technical specification needs to be developed to describe the appropriate approaches and procedures for sampling design, response, and analysis protocols. A web-based validation tool needs to be developed accordingly to enable the online and collaborative validation processes by experts from different parts of the world. The GEO members and participating organizations should be mobilized and invited to join the collaborative validation project with their expertise and available resources. Figure 1 illustrates the overarching conceptual framework of this GEOled validation project.

Key methodological issues in developing the technical specification
In order to develop an internationally agreed technical specification on 30-m GLC validation, a number of technical issues were examined and explored, including LSI-based sampling design, reference data collection, sample judgment and labeling, and accuracy assessment.

LSI-based sampling design
A LSI-based sampling design was adopted for 30-m GLC validation to guarantee samples with good spatial representation and even spatial distribution. The design satisfies with the following criteria: relatively high sample size or density for heterogeneous landscapes, spatially distributed well (spread throughout the region without large gaps), representative sample numbers for all classes, especially rare classes and fragmented classes and self-adaptive ability to different regions (Chen et al. 2016). As shown in Figure 2, three LSIs are calculated for three different geographical levels,   and used to estimate the sample size for regions, allocate sample to each class, and distribute sample sites.
(1) Determining sample size: A regional LSI (called as rLSI) is used for the determination of the sample size of a given target region and the formula is as the following (Chen et al. 2016): For any given region i, its sample size N i can be derived using its regional landscape shape index (rLSI i ) and its area (A i ), as well as those of the other regions, and the total size N. It ensures that the regions that are more heterogeneous would have larger sample sizes or higher sample densities.
(2) Allocating sample size to classes: A class-level LSI (called as cLSI) is used for allocating the derived sample size of a specific target region to its individual land cover classes. In Neyman optimal allocation formula, S as the standard error of class k, is the key element as the measure of dispersion, spatial variability and inhomogeneity of the spatial distribution. The standard deviation is greater for the class with stronger heterogeneity. Thus, more samples should be allocated.
Here, the cLSIs are used as a proxy for S: where cLSI i,k is the cLSI of class k in region i. The sample number of class k (cN i,k ) can be derived according to regional sample size N i , the areal proportion (W i,k ) and cLSI i,k . Experiments showed that the sample size of rare land-cover class have been increased using cLSI-based class sample size allocation formula (Chen et al. 2016).
(3) Selecting sample sites: The per-class LSI (named uLSI) is calculated for each sample unit and a specific uLSI-curve can be created according to the level of spatial heterogeneity by sorting all units by uLSI in descending order. The x-axis pertains to the serial number in the geographical units and the y-axis pertains to uLSI pertains to the corresponding uLSI. In this way, the uLSI-curve can serve as a spatial heterogeneous space, and a two-stage sampling using uLSIcurve was designed for selecting sample units (Brovelli et al. 2015). In the first stage, a primary sampling unit (PSU) is selected via equal step sampling protocol along the x-axis of uLSIcurve. In the second-stage, one pixel is selected as sample sites using random sampling protocol within each PSU. More sample sites are located in these small patches. It means that the sample location considers both heterogeneous and homogeneous landscapes.

Reference data collection
The collection and integration of various reference data and resources are essential for collaborative GLC validation. The reference resources include Very High Resolution (VHR) images, thematic maps, in-situ measurements, crowdsourcing and VGI data, and so on. Before using these reference data, proper evaluations need to be done to ensure the data quality, such as the completeness, logical consistency, positional accuracy, thematic accuracy, temporal quality and usability. Geo-Wiki, OpenStreetMap, in-situ measurements and other public geographic data are among the valuable reference sources for GLC validation. Geo-Wiki attempts to integrate open access to high-resolution satellite imagery from Google Earth with crowdsourcing into a single Web 2.0 application (Fritz et al. 2009(Fritz et al. , 2012. The user can plot GLC cover maps on the top of Google Earth, and display the disagreement maps between any pair of land cover products. In-situ measurement data resulted from fieldwork is another valuable reference source for GLC validation. Since field data collection is time-consuming and labor-intensive, their use at the global scale is limited due to high costs. As a result, field data collection can act as supplementary means in some places where qualified images are missing or the detailed class information is needed.

Sample judgment and labeling
With the help of an online tool, the land cover types of the sample points can be judged by validation experts or volunteers through the interpretation of highresolution images, or comparison with large-scale land cover maps, as well as in-situ measurement data. Some additional supporting information can be provided or uploaded, such as text commentaries and photos of the samples.
A two-scale window strategy was proposed for interpretation of land cover types from imagery. As shown in the upper-right part of Figure 3, each sample point recorded as SID (Samples' ID) is enclosed by two square areas, that is a 30 m * 30 m window, and a 300 m * 300 m window. The labels of two land cover types are then interpreted and assigned to each sample point. The first is interpreted according to the land cover covering the biggest proportion of the 30 by 30 meters window, and the second is based on the majority land cover in the 300 by 300 meters window.
These two labels are saved as LCval30 and LCval300 in the attribute table of the sample judgment, as shown in Figure 4 In order to examine the credibility of the above sample judgment and labeling, a four-level confidence measure was further proposed. Level A means the sample judgment is absolutely correct. Level B implies that the judgment is absolutely incomplete, while Level C means the sample judgment is uncertain, which is due to reference data quality. The sample judgment of Level D is uncertain, which may be due to personal knowledge and interpretation ability. Each sample may have three possible results by the use of judgment methods: 100% correct judgment, 100% wrong judgment and uncertain judgment. With the four levels of the degree of trust, the results of sample judgment can be evaluated and demonstrate the accuracy assessment in different trust. This can enable the elimination of potential false judgment samples that will influence the final accuracy evaluation, and guarantee the credibility of the GLC validation.

Accuracy assessment
Accuracy assessment aims to provide an index of how closely the derived class allocations depicted in the thematic land cover map represent reality. The commonly adopted method is confusion matrix and its derived basic descriptive measures, including overall accuracy, user and producer accuracy, Kappa and so on. Overall accuracy is a general measure for the entire map. It is simply the sum of major diagonal (i.e. the correctly classified sample units) divided by the number of sample units in the error matrix. User's accuracy means the probability that a pixel classified on the map represents the class on the ground whereas producer's accuracy indicates the probability of a reference pixel being correctly classified (Congalton 1991). By examining relationships between the two measures, the map user gains insight about the varied reliabilities of classes on the map, and the analyst learns about the performance of the process that generated the maps.
The Kappa analysis is a discrete multivariate technique used in accuracy assessment to statistically determine if an error matrix is significantly different from another. Not all agreements can be attributed to the success of the classification. Kappa attempts to provide a measure of agreement that is adjusted for chance agreement.

Online validation tool -GLCVal
A web-based system (GLCVal) was designed and developed to support the international collaborative validation of GlobeLand30. The basic idea is to integrate the concept of "Internet plus" and service computing technologies with land cover validation procedures (Chen et al. 2018). Figure 5 illustrates its major functionalities, supporting data services and algorithms. This online system (http://glcval.geo_com pass.com) serves as a platform to assemble land cover validation methods and workflows into a single portal, including the storage and online access of all the available reference data, "wizards" guided validation workflow, self-adaptive sampling designs, userfriendly interpretation of samples from imagery, and generation of accuracy assessment report.
The GLCVal allows users to carry out their validation tasks through a step-by-step guidance. After logging into GLCVal, a new validation task can be created with a supporting functional sequence. A land cover product needs to be selected or uploaded and displayed. Users can choose any administrative or geographical region of interest or define the area of interest by uploading an administrative boundary data for validation. Then, they may choose LSI-based   sampling or other sampling approach to generate the sample dataset, which will be displayed on the screen. Users can interpret the land cover type of each sample point by comparing higher resolution image of the same place, as shown in Figure 6. When the judgment and labeling of all the samples are completed, an accuracy assessment report will be generated automatically.
The labeled samples are one of the important outputs of GLC validation. A sample metadata was developed and all the valid sample points were integrated, stored for further utilization and sharing.

Collaborative validation practices: case studies
About 30 GEO member countries and participating organizations joined this GEO-led international collaborative validation. The technical specification was used for guiding the collaborative validation of GlobeLand30 with the support of online tool GLCVal. Here, we present two case studies conducted in Bulgaria and Sweden.
The Republic of Bulgaria is located in the Southeastern Europe, the middle of the Balkan Peninsula, and covers an area of 110,994 km 2 . The territory of Bulgaria is characterized by significant geodiversity and heterogeneous landscape determined by the variety of landforms, the high-level segmentation of the relief and the specifics of the climatic and soil types. The location of Bulgaria in the southern parts of the temperate climatic zone, the Black sea and the proximity to the Mediterranean Sea all influence the climatic elements and determine the transition from temperate continental climate in north, through transition climate zone and continental Mediterranean climate in the southern part. The transitivity of natural components and the diversity of the relief influence on the landscape structure and heterogeneity.
Sweden is located in northern Europe, and is one of the world's northernmost countries. Stretching 1,574 km from north to south and 499 km from east to west, Sweden has a total area of 450,295 km 2 and is the third largest country in the European Union. Sweden is characterized by its long coastline, extensive forests and numerous lakes. 70% of Sweden is forest, while only 8% of land cover is farmland. Sweden is divided into three major regions: to the north is the vast mountain and forest region; central Sweden consists of lowland in the east and highland in the west; and southern Sweden includes highlands in Småland and agricultural plains in Skåne. The climate varies across Sweden, but it is mainly temperate in the south and subarctic in the north. Using the GLCVal system, 382 and 394 sample points were generated for Bulgaria and Sweden, respectively, with LSI-based sampling approach at 95% confidence level, as shown in Figure 7. These sample points were judged and labeled by the experts from the Academy of Bulgaria and by research assistants at KTH Royal Institute of Technology following the guidelines of the technical specification for 30-m resolution GLC validation. Additional reference data were used, including local orthophotos, highresolution satellite images, and national land cover data. The color orthophoto maps of Bulgaria have a spatial resolution between 0.4 and 0.5 m, acquired in 2006 and 2010-2011. For Sweden, high-resolution satellite images for the period 2009-2011 from Google Earth at a spatial resolution between 0.6 and 2.5 m were utilized for validation.
The validation result of GlobeLand30 dataset is presented as an error matrix in Tables 1 and 2. The Overall Accuracy (OA) of the Globeland30 in Bulgaria and Sweden are 79.84% and 88.58% (kappa: 0.87), respectively. The Producer Accuracy (PA) and the User Accuracy (UA) are presented for each of the land cover classes. For Bulgaria, the land cover classes with highest producer accuracy (over 80%) are the cultivated land, forest and waterbodies and artificial surfaces, while the lowest results are observed in shrubland and wetland classes. The user accuracy shows that the class with highest values is forest (92.2%), followed by the artificial areas. The cultivated land, shrubland and waterbodies are assessed with user accuracy over 70%, while lowest values are again observed for the wetland land cover class. For Sweden, almost all land cover classes achieved over 80% producer accuracy except bareland. The 100% accuracy for the water class is outstanding for Sweden as it has around 100,000 lakes with different conditions. Bareland has the lowest classification accuracy and was confused with permanent snow and ice, tundra, and artificial surfaces. This is likely due to the lack of images acquired at the suitable season to map permanent snow and ice in addition to the spectral similarities of bareland, tundra and artificial surfaces. In terms of user accuracy, all land cover classes yielded over 80% accuracy except permanent snow and ice. The low user accuracy for permanent snow and ice was caused by commission errors from bareland and tundra.

Conclusions
A technical specification and an online tool are two essential ingredients for a successful collaborative 30-m GLC validation. From 2015 to 2017, the GEO Global Land Cover Community Activity had organized a project to examine the major technical issues, develop methods and tool(s), and carry out case studies. About 30 countries and organizations joined this GEO-led international cooperative. A technical specification for 30-m GLC validation was developed by considering the findings and experiences of international experts and users. An online validation tool, GLCVal, was developed by integrating land cover validation procedures with the service computing technologies. About ten workshops and seminars were organized to discuss the methodology adopted and disseminate the results. About 20 countries have completed the accuracy assessment of GlobeLand30 for their territories with the guidance of the technical specification and the support of GLCVal.
As the world's first 30-m GLC data product, GlobeLand30 has been used by scientists and users from more than 130 countries for environmental change analysis, geographical condition monitoring, urban and rural management, earth surface process modeling, and sustainable development. The 2020 version of GlobeLand30 was released recently for open access and was shared on the occasion of the 75th anniversary of the United Nations. It can be expected that in the future more and more GLC data products at higher spatial, temporal, and thematic resolutions will be released as a support to the United Nations 2030 Sustainable Development Goals (SDGs). The results of this GEO-led project will provide a solid basis for promoting a comprehensive and collaborative validation of these new GLC products. It will also be a good opportunity to further improve the efficacy and efficiency of global-scale land cover validation.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data Availability Statement
The data that support the findings of this study are openly available in "Global Change Data Repository" at http://dx.

Notes on contributors
Jun Chen is an academician of Chinese Academy of Engineering, and a professor of National Geomatics Centre of China, His research interests are land cover mapping, validation, service and application.