Uncertainty evaluation approach based on Shannon entropy for upscaled land use/cover maps

ABSTRACT Understanding the scale of land use/cover (LULC) map and its impacts on representing LULC is central to address earth observation issues. However, there is an absence of quantitative uncertainty evaluation of upscaled maps to be used over decades. An approach based on the Shannon entropy theory was then proposed to tackle this issue by reporting categorical heterogeneity information contained in upscaled pixels. The Majority Rule-Based aggregation algorithm was performed to generate upscaled maps at different widely used scales using a national LU map. The results reveal that substantial uncertainties inevitably exist in the upscaled maps. Additionally, the analysts demonstrate that the proposed approach can-and-indeed accurately provide spatially uncertain information of upscaled maps. These findings suggest that this approach is necessary for users to most effectively use these maps in earth observation models and should be extensively used in the future work.


Introduction
Observation of land use/land cover (LULC) is essential for understanding and monitoring earth system (Chen et al., 2020;C. Zhang et al., 2019). The LULC distribution and area provide fundamental information for various applications (Doelman et al., 2018;Le Provost et al., 2020;Tesfaw et al., 2018). These applications do not only require LULC at high resolutions but also coarser spatial resolutions (Matasov et al., 2019;Sohl et al., 2016;Verburg et al., 2011). Given multiple objectives, the upscaled maps at a lower spatial resolution have been as crucial input data for many research studies (Cole et al., 2018;Grafius et al., 2016;D. He et al., 2021;Y. He et al., 2022;Li & Zhou, 2018;Van Vliet et al., 2019;Zen et al., 2019;J. Zhang et al., 2022). Note that the scale refers to the spatial resolution of the LULC map in the literature (Moody & Woodcock, 1994, 1995Tan et al., 2015). The upscaling techniques using an initial base LULC map at a finer resolution to generate upscaled maps at coarser resolutions are thus highly required.
Various techniques were then developed in necessary and efficient ways (Sun et al., 2017), such as the Majority Rule-Based aggregation method (MRB; H.S. He et al., 2002;Saura, 2004), Point-Centred Distance-Weighted Moving Window (PDW; H.S. He et al., 2002), Random Rule-Based (RRB; Gardner et al., 2008), and Fusing Class Membership Probability and Confidence-level probability method (FMC; . However, these techniques produced proportion errors of land covers and biased spatial/landscape patterns in the upscaled maps. For example, Moody and Woodcock (1994) reported that the MRB led to the increase of the proportion of the dominant class. Moody and Woodcock (1995) further reported that the proportion errors tend to decrease as patch size and initial proportions of cover types increase. Raj et al. (2013) then compared the uncertainties from the MRB, PDW and RRB, which confirmed inevitable proportion errors and the biased spatial/landscape patterns (i.e. patch size and shape complexity) generated by these techniques. , (2019) and Sun and Congalton (2021) further quantitatively evaluated the errors of spatial pattern in upscaled maps using various landscape metrics (i.e. Patchper-Unit, Aggregation index, Fragmentation, Dominance, Square-pixel index, and other metrics proposed by McGarigal et al., 2012). These previous studies indicated the inevitable and inaccurate LULC information in the upscaled maps.
How to efficiently report uncertainty information of upscaled maps for map users was then plaguing the community.  thus proposed a similarity matrix approach to evaluate the quality of upscaled maps. This approach assesses similarities between upscaled maps and their corresponding base map by three indicators: overall similarity, omission error, and commission error. However, two shortcomings were noticed considering map users' practical demands for an optimal map. First, map users could not quantitatively assess the uncertain information at pixel level derived from the upscaled maps with mixed LULC information determined by the base map. They, thus, could not quantitatively evaluate the influence of the upscaled maps on their research. Second, map users are not aware of the spatial distribution of mapping uncertainties in the upscaled map. They are hindered from judiciously using LULC map by selecting either reliable spatial resolutions or suitable locations to fit in their research. The shortcomings inevitably pose a dilemma over opting a suitable map with uncertainty information, and consequently unknown uncertainties of various research. This considerably restricts burgeoning demand for LULC maps at different spatial resolutions for the community.
Most importantly, the unqualified upscaled maps potentially lead to errors or uncertainties for map users' models (Verburg et al., 2011). The questions and dilemmas to select optimal maps for various research were then raised to plague the community over few decades. Additionally, the earth observation community highly demanded an informative estimate of uncertainty provided with a map (Griffiths et al., 2019;Lyons et al., 2018;Verburg et al., 2011). A quantitative and in-depth understanding of the uncertainty of upscaled maps is important to rescue the community from the predicament of choices selecting LULC data either at suitable spatial resolutions or at locations where the data has a lower uncertainty. Unlike frequently giving qualitative judgment on the upscaled maps, the quantitative approach should provide uncertainty information at pixel level, and considerably benefit decisionmaking in selecting an optimal map. Efforts, therefore, have been made to fulfill these requirements, but only few, to the best of our knowledge.
Taken together, quantitatively assessing the spatial uncertainties of the upscaled maps, therefore, is demanded. However, this demand is still crucially required to be satisfied over decades. Upscaling techniques use various rules to assign a categorical label to the upscaled pixel based on its corresponding categorical labels of the base map. The uncertainty information of the upscaled map at pixel level could be then described as the heterogeneity or the disorder of its corresponding categorical labels from the base map. The tricky question of how to present uncertain information in the upscaled map could be thus answered based on the information science. This is because that the upscaling technique is a classic example of the information communication between a base map and its corresponding upscaled maps. This paper, therefore, proposed a quantitative approach to assess the spatial uncertainty of the upscaled map at pixel level based on a valid theory of information communication, the Shannon Entropy. A popular upscaling technique, the Majority Rule Based Upscaling Approach (MRB), was employed to generate upscaled maps.

Materials
The approach was validated using National Land Use data of China (NLUC) at 30 m for 2018 as the base data for upscaling. The data was obtained from the Resource and Environment Science and Data Center of the Institute of Geographic Sciences and Natural Resources Research (https://www. resdc.cn/). The data was reclassified into six LU types: crop, forest, grassland, water area, construction land, and unused land according to the metadata, using ArcGIS version 10.8, which covers Mainland, Taiwan, and China.

Uncertainty evaluation based on Shannon entropy
The Shannon Entropy (Eq. (1)) was proposed to quantify the uncertainty of an information source for measuring the disorder in a system (Shannon, 1948). It has been validated as an effective approach to assess the heterogeneity of the attribute for an evaluation object (Wu et al., 2013;Yang et al., 2018). The higher entropy value represents higher uncertainty of the information. The upscaled maps generally aggregate the information of categorical labels from the corresponding base map containing heterogeneous attributes. The higher heterogeneity could result in higher entropy as well. Assessing this heterogeneity information thus becomes a vital breakthrough to evaluate the uncertainty of upscaled maps. Therefore, this paper developed an uncertainty evaluation approach based on the Shannon Entropy for upscaled maps at pixel level.
where HðXÞ is the entropy of the evaluation object X, n is the number of heterogeneous attributes, x i is the i th possible value x of out of n symbols, pðx i Þ is the probability of object X ¼ x i . In the context of the upscaled maps, each categorical label represents one class type. The class type,C up X , of the upscaled pixel X, is determined by the class types,C j , of the corresponding pixels, x i , in the base map. Thus, the uncertainty information (i.e. categorical heterogeneity) of upscaled pixel can be estimated by Eq. (2). The occurrence probability of each class type, C j , can be presented by pðx j Þ (Eq. (3)) based on the occurrence probability of all corresponding class types in the base map. If the number of the class type in the base map increases, the entropy value would increase. This could explicitly represent the uncertainty information contained in the upscaled pixels after giving a new class-type assignment. For example, in Figure 1, the base map at 10 m spatial resolution is desired to be upscaled to 30 m spatial resolution. Nine pixels, x i , from the base map, with five different class types, C 1 ,C 2 ,C 3 ,C 4 , and C 5 , respectively, are corresponding to the upscaled pixel, X. The number of each class type is 3, 2, 2, 1, and 1, respectively. The occurrence probabilities for all class types, pðx j Þ, are 3/9, 2/9, 2/9, 1/9, and 1/ 9, respectively, according to Eq. (3). The E up is then estimated as 2.1972 according to Eq. (2).
where E up is the entropy evaluation of the upscaled pixel, pðx j Þ is the occurrence probability of class type j corresponding to the upscaled pixel X, N j and N are the total number of class type j and the total number of base pixels corresponding to X, respectively.

Majority rule based upscaling technique
The MRB, a common way (Sun et al., 2017), was selected to generate upscaled land cover maps. This upscaling technique determines the class type for the coarser resolution pixels in the upscaled map by selecting the most frequently occurring class from the finer resolution map contained within each coarser pixel (H.S. He et al., 2002;Saura, 2004). When there is more than one major class, the dominant class is selected at random (Raj et al., 2013). For each upscaled resolution of interest, a predefined and non-overlapping square window corresponding to the upscaling resolution (i.e. coarser resolution) is first constructed (H.S. He et al., 2002;Moody & Woodcock, 1995). The size of the window is the same as the desired spatial resolution of the output maps . Each window is assigned a class based on the most frequently occurring class within it. For example, in Figure 1, the predefined window is set as 30 m spatial resolution according to the desired spatial resolution of the upscaled maps. The class type of C 1 occurs three times that is larger than the frequencies of the other four classes within the predefined window. The class type of upscaled pixel, C up X is then assigned as C 1 .

Upscaling and its uncertainty evaluation
The MRB was conducted to generate upscaled LU maps at 150 m, 300 m, 600 m, 900 m, 1500 m, 3000 m, 6000 m, and 9000 m, respectively. The proposed evaluation method was used to assess the spatial uncertainties of all upscaled maps at pixel level. Figure 2 shows the upscaling maps and their associated uncertainty evaluation maps at pixel level. The higher entropy value indicates the higher uncertainty of the pixel. The results visually show that the upscaling technique led to considerable change of the representation of LU. Also, the extent of uncertain pixels with increasing entropy values is expanding with the upscaling. Figure 3 shows that the entropy values are highly variable due to the upscaling. The upscaled map with a coarser resolution shows a wider distribution of the entropy data. This represents a higher probability that the LU pixel would take on the given high entropy value. Also, the mean value increased from 0.0674 to 0.4260 with the variance increasing from 0.0513 to 0.3588 when the base map was upscaled from 150 m to 9000 m. For example, the highest entropy value reached 2.6398 for the scale of 6000 m. Furthermore, the inner box plots show increasing estimated quartiles with the upscaling.

Biased and highly variable land use area information in upscaled maps
The upscaled maps show strongly biased LU area information (Figure 4 (a)). The up-scaling technique results in a considerable decrease in water area, construction land area, and the unused land area, about 55.46% (from 288,811.1 km 2 to 28,628.0 km 2 ), 71.59% (from 267,194.0 km 2 to 75,897.0 km 2 ), and 2.92% (from 2,221,705.0 km 2 to 2,158,650.0 km 2 ), respectively. The technique results in the increase in the crop area, the forest area, and the grassland area, about 2.40% (from 1,785,237.0 km 2 to 1,828,078.0 km 2 ),15.36% (from 2,278,543.0 km 2 to 2,628,612.0 km 2 ) and 0.94% (from 2,653,360.0 km 2 to 2,678,427.0 km 2 ), respectively. Additionally, the LU area information provided by the upscaled maps is highly variable (Figure 4 (b), (c), (d), (e), (f), and (g)). The standard deviation for grassland area, forest area, unused land area, crop area, water area, and construction land area, are 8573.896, 132,219.100, 24,497.240, 16,259.800, 58,867.780, and 72,814.760, respectively.

Discussion
Upscaled maps contain considerable uncertain information of either the area or the spatial distribution. The LU area information from the upscaled map is highly variable due to the upscaling technique (Figure 4). In this paper, the MRB as an example has shown that the area of the major LU type tended to increase with the upscaling, whilst the area of the minor LU type tended to decrease. For example, Figure 2 shows that the grassland gradually covers most of the subregion, whilst the crop, the forest, the water area, and the construction land were lost in the upscaled maps at 6000 m and 9000 m spatial resolution. This is because of the designed principles of MRB and the original area information from the base map that map users used (Raj et al., 2013;. If different upscaling techniques were adopted, the area information would hardly be predicted or managed by map users. Earlier research also pointed out that mapping errors in base maps would significantly result in uncertain area information in upscaled maps (Sun et al., 2017). These facts further result in uncertain spatial distributions of LU types. Map users, therefore, urgently demand an approach to quantitatively assess the uncertainty of area information and spatial distributions of land use, who require maps at different scales to fit their experiments. The approach proposed in this paper could satisfy the map users' demands by quantitatively providing the heterogenous information at pixel level. The entropy value generated by the approach quantitatively indicates the uncertain information at specific locations. For example, Figure 2 shows that the uncertainty highly occurs at the boundary of different LU types, which is consistent with previous research . Additionally, considering that the statistically estimated distribution of entropy values of the upscaled maps is easily accessible by many software (e.g., R studio), map users, therefore, could conveniently select optimal maps, and quantitatively evaluate the influence of upscaled maps on their research.
Additionally, our evaluation method lends a new perspective on the quality assessment of upscaled maps through the Shannon Entropy to assess the heterogeneous/uncertain degree. It could generate spatial quality assessment (QA) data assessing uncertainty for selecting suitable LULC data and evaluating uncertainty propagations to models of interest for map users. Earlier research hinge on the proportion errors calculated by the area information of land covers and the change of landscape patterns assessed by various landscape metrics. Although these evaluation methods could roughly assess the accuracy of upscaled maps, these could hardly support spatial analysts of uncertainty evaluation as the community expected. The QA data generated by the proposed method in this paper would potentially fulfil the expectation.
The results also demonstrate a general rule that the larger entropy value responses to the upscaling technique because the upscaled pixel usually contains more heterogeneous class information. This response is consistent with an earlier assessment of the upscaling procedure in response to heterogeneity (H.S. He et al., 2002;Moody & Woodcock, 1994, 1995Raj et al., 2013; Sun & Congalton,   Sun et al., , 2019. However, the heterogeneity of the LULC represented by the base map would greatly impact the uncertainty of the upscaled maps (Sun & Congalton, 2021). If the LULC had a relatively higher homogeneous pattern, the upscaled pixels would contain less uncertain information or classes (Sun & Congalton, 2021). The entropy value would thus not keep increasing with the upscaling procedure. Consequently, the results indeed do not identify any strong dependence between the entropy value and the scale, but here we further confirm that spatial pattern (i.e. heterogeneity) greatly impacts the uncertainty of an upscaled map.

2021;
Taken together, results from this study provide an excellent proof of principle that the uncertainty evaluation approach based on the Shannon Entropy is a viable way for the uncertainty assessment of upscaled maps. Also, new upscaling techniques tackling the uncertainty issues should be addressed. For example, an upscaling procedure using adaptive windows to achieve a target upscaled size should be developed. Our analysis, thus, strongly suggests using this necessary and efficient approach for map users to most reasonably and scientifically use LULC data due to the existing substantial uncertainties.

Conclusion
This paper proposed an uncertainty evaluation approach based on the Shannon Entropy theory to quantitatively assess the uncertainty of the upscaled land cover maps. By developing an entropy evaluation algorithm at pixel level, the approach accounting for categorical heterogeneity information of the upscaled pixel. This quantitative information, taken as the uncertainty evaluation for upscaled maps at pixel level, could accurately describe the quality of upscaled maps for map users. A conventional upscaling technique, Majority Rule-Based aggregation (MRB), was used to generate the upscaled land use maps at 150 m, 300 m, 600 m, 900 m, 1500 m, 3000 m, 6000 m, and 9000 m, respectively, for Mainland and Taiwan, China, at 2018. The approach was then well performed to evaluate the uncertainty of these maps at pixel level. Two conclusions resulted from this performance: (1) the substantial and subtle uncertainties inevitably existing in upscaled maps deliver inaccurate information of land use area and spatial distributions of land use types. This urgently demands a quantitative approach for map users before making a decision on selecting the datasets as input to the users' models; (2) the uncertainty evaluation approach based on the Shannon entropy enables map users to efficiently and accurately assess the quality of upscaled maps by reporting categorical heterogeneity information contained in the maps. Also, this easy-to-use approach could support uncertainty assessment of map users' models using the upscaled maps as input. Overall, the importance of evaluating uncertainty of upscaled maps has been emphasized, whilst the approach has been validated in an efficient way. These findings strongly recommend that the earth observation community using the proposed and well-performed approach towards reasonable utilization of land use/cover maps at different scales.