SOUTH AFRICAN JOURNAL OF PLANT AND SOIL

Digital soil mapping (DSM) is currently carried out in many parts of the world and at different scales, including continental and global scales (e.g. Lagacherie and McBratney 2007; Grunwald 2009; Arrouays et al. 2014; Hengl et al. 2015; Minasny and McBratney 2016). In essence, DSM aims at determining soil variation in relation to the landscape by finding measurable proxy variables for the soil property of interest and developing quantitative (spatial or non-spatial) models for prediction of the target property. Jenny (1941) identified five major soil-forming factors and formulated a mechanistic model for soil development. These so-called ‘clorpt’ factors are climate (cl), organisms (o), relief (r), parent material (p) and time (t). McBratney et al. (2003) proposed a generic framework for DSM, the so-called ‘scorpan’ factors (soil, climate, organisms, relief, parent material, age and spatial position), based on Jenny’s approach, but also taking spatial dependence into account. This general approach has been widely accepted in the area of DSM and pedometrics (Grunwald 2009). It is customary for DSM products to comprise various estimates of uncertainty, but it may be difficult for most potential users to judge how the resulting soil property maps can be implemented in practice and the scales at which they may be applied successfully. The often high spatial resolution gives the impression that the maps are appropriate to use directly at a rather local scale. For example, in global efforts such as the GlobalSoilMap initiative (Arrouays et al. 2014; Hengl et al. 2015), predictions are made for six soil layers down to 2 m depth at 100 m × 100 m or even finer spatial resolution. However, as noted by Arrouays et al. (2014), the resolution of the cell is not a measure of uncertainty, but rather a geometric framework for storing the soil information. This is a fact that may easily be overlooked or misunderstood by potential users. Development of global and continental soil databases will change the manner in which soil data can be included in, for example, the decision-making processes in society at large (Miller 2012). One question that may arise when decisions on new soil sampling surveys are being taken is whether it is really necessary to collect many soil samples if detailed maps can be downloaded free from the Internet. Using the example of Rwanda, this study assessed the choice between employing continental data sets, local samples or a combination of the two by comparing analytical data for a large number of soil samples from Rwanda and data from the newly available AfsoilGrids250 database. Soil organic carbon (SOC) content is one of the most important indicators of soil conditions (Koch et al. 2013; Musinguzi et al. 2013; O ́Rourke et al. 2015). Soil pH is also a useful indicator property (Koch et al. 2013) and therefore the present analysis was limited to these two soil properties. Furthermore, SOC stocks are currently a much discussed topic in both science and politics across a multitude of Improved usefulness of continental soil databases for agricultural management through local adaptation

Digital soil mapping (DSM) is currently carried out in many parts of the world and at different scales, including continental and global scales (e.g. Lagacherie and McBratney 2007;Grunwald 2009;Arrouays et al. 2014;Hengl et al. 2015;Minasny and McBratney 2016). In essence, DSM aims at determining soil variation in relation to the landscape by finding measurable proxy variables for the soil property of interest and developing quantitative (spatial or non-spatial) models for prediction of the target property. Jenny (1941) identified five major soil-forming factors and formulated a mechanistic model for soil development. These so-called 'clorpt' factors are climate (cl), organisms (o), relief (r), parent material (p) and time (t). McBratney et al. (2003) proposed a generic framework for DSM, the so-called 'scorpan' factors (soil, climate, organisms, relief, parent material, age and spatial position), based on Jenny's approach, but also taking spatial dependence into account. This general approach has been widely accepted in the area of DSM and pedometrics (Grunwald 2009). It is customary for DSM products to comprise various estimates of uncertainty, but it may be difficult for most potential users to judge how the resulting soil property maps can be implemented in practice and the scales at which they may be applied successfully. The often high spatial resolution gives the impression that the maps are appropriate to use directly at a rather local scale. For example, in global efforts such as the GlobalSoilMap initiative (Arrouays et al. 2014;Hengl et al. 2015), predictions are made for six soil layers down to 2 m depth at 100 m × 100 m or even finer spatial resolution. However, as noted by Arrouays et al. (2014), the resolution of the cell is not a measure of uncertainty, but rather a geometric framework for storing the soil information. This is a fact that may easily be overlooked or misunderstood by potential users.
Development of global and continental soil databases will change the manner in which soil data can be included in, for example, the decision-making processes in society at large (Miller 2012). One question that may arise when decisions on new soil sampling surveys are being taken is whether it is really necessary to collect many soil samples if detailed maps can be downloaded free from the Internet. Using the example of Rwanda, this study assessed the choice between employing continental data sets, local samples or a combination of the two by comparing analytical data for a large number of soil samples from Rwanda and data from the newly available AfsoilGrids250 database.
Soil organic carbon (SOC) content is one of the most important indicators of soil conditions (Koch et al. 2013;Musinguzi et al. 2013;O´Rourke et al. 2015). Soil pH is also a useful indicator property (Koch et al. 2013) and therefore the present analysis was limited to these two soil properties. Furthermore, SOC stocks are currently a much discussed topic in both science and politics across a multitude of Introduction scales. O'Rourke et al. (2015) reviewed SOC stock science and policy and found that most scientific work is aimed at understanding the biophysical processes governing SOC content at small scales, from particles to landscapes, whereas policy work is predominantly aimed at larger (even global) scales. The authors concluded that attempts to characterise the greatest risks to SOC stocks require data spanning a number of scales and that science and policy need to be integrated across multiple scales.
The overall aim of the present study was to assess the possibilities of utilising data from the continental AfsoilGrids250 database at two different levels of relevance in practical agricultural advisory work in Rwanda, namely point locations (representing smallholder farms) and administrative sector units. The specific aims of the study were to compare data on topsoil pH and SOC content obtained from AfsoilGrids250 with data obtained through actual soil analyses, and to test the possibility to adjust the continental data set using a limited quantity of local data.

Geography of Rwanda
Rwanda is located in central Africa (2.0° S, 30.0° E), on the eastern side of the western fork of the East African Rift Valley, and comprises an area of about 26 000 km 2 . It is one of the most densely populated countries in Africa (World Bank 2015). Almost the entire country has an altitude above 1 000 m above sea level (asl). A mountain range in the west, generally higher than 2 000 m asl, stretches from north to south (Figure 1a). The highest altitude (4 507 m) is attained in the Virunga volcano chain in the north. The eastern part of the country is less elevated, with savanna and numerous lakes. The climate is temperate to subtropical, with two rainy seasons and two dry seasons each year. The 'long rains' fall in March-May and the 'short rains' in September-December (Rwanda Meteorological Agency 2015). About 56% of the country (an area equivalent to 1.2 Mha) is classified as agricultural land according to the Global Land Cover Facility database (Shannan et al. 2014) (Figure 1b). Agriculture in Rwanda is based mainly on smallholder subsistence agriculture (Republic of Rwanda 2012). Crops grown include root crops such as potato, sweet potato and cassava; cereals such as maize and sorghum; pulses, especially beans; and bananas and other vegetables and fruit (National Institute of Statistics of Rwanda 2015). Coffee and tea are major export crops. In 2015, slightly less than 80% of the working population was occupied in agriculture, but agriculture was only responsible for one-third of gross domestic product. However, up to 70% of the country's export income is generated by the agricultural sector (Rwanda Development Board 2015), although a relatively large quantity of agricultural produce is still imported. Most of the agricultural area in Rwanda is dominated by Alisols, Acrisols, Cambisols and Ferralsols (Jones et al. 2013;IUSS Working Group WRB 2014). There are smaller areas of Andosols (the Virunga area in the north), while an area south-west of Kigali province, shown in Figure 1a

Soil samples and soil data sets
In 2015, the International Fertilizer Development Center (IFDC, East and Southern Africa Division, Nairobi, Kenya), in collaboration with the East African soil laboratory Crop Nutrition Laboratory Services (CropNuts), Nairobi, Kenya, conducted a national soil sampling campaign of agricultural soils in Rwanda, collecting a total of 900 soil samples. These soil samples were distributed relatively uniformly over the country (Figure 1b). Each sample consisted of 25-30 subsamples collected across a 0.5-1.0 ha farm that was judged to be representative of the area in terms of crops, topography and soils. The sampling locations (the middle of the subsampled area) were positioned using a Trimble Juno GPS (Sunnyvale, CA, USA) (positional accuracy ≤5 m). For the purposes of this study the analysis was limited to two important topsoil (0-20 cm depth) properties: pH (H 2 0) and SOC content. The pH was measured with the potentiometric method at a soil:water ratio of 1:2. The SOC content was determined by the colorimetric method (Walkley and Black 1934), after wet oxidation by acidified potassium dichromate in the presence of sulphuric acid.
In the analysis, 800 of the total 900 soil samples were used as an 'Exhaustive' data set (denoted Exh800). This represented an average sampling density of one sample per 1 500 ha agricultural land. In order to compare how the number of samples affected the accuracy of adapted maps, the Exh800 data set was also divided randomly into three subsets consisting of 400, 200 and 100 samples (Exh400, Exh 200 and Exh100). The remaining 100 soil samples of the original 900 samples were used as an 'Independent' data set (Ind100).
The continental database used was AfsoilGrids250, produced by ISRIC -World Soil Information, Wageningen, The Netherlands, in collaboration with a number of international organisations (The Earth Institute, Columbia University; World Agroforestry Centre [ICRAF], Nairobi; and International Center for Tropical Agriculture [CIAT]). It includes predictions on SOC content and pH (H 2 O) , but also on a great number of other soil properties: texture, bulk density, cation exchange capacity (CEC), total nitrogen, exchangeable acidity, aluminium (Al) content, and exchangeable bases (calcium [Ca], potassium [K], magnesium [Mg] and sodium [Na]). The data set covers the whole African continent at a 250 × 250 m 2 spatial resolution (representing a support area of 6.25 ha) and at up to six soil depths (0-5, 5-15, 15-30, 30-60, 60-100 and 100-200 cm) using three-dimensional regression kriging based on random forests . The basis for the predictions is a set of about 28 000 soil observations distributed throughout the African continent, combined with a set of covariates (one of which is elevation, shown for Rwanda in Figure 1a). For the purposes of the present study, data for Rwanda from AfsoilGrids250 on pH and SOC content of the two uppermost soil layers (a predicted value at 0-5 cm depth and a prediction at 5-15 cm depth) were combined as a weighted average for the top 15 cm of the soil (denoted the Afsis data set) in order to resemble the depth of the actual soil samples used for comparison (0-20 cm) in the Exh800 and Ind100 data sets described above. A schematic overview of the data sets and their use is presented in Figure 2.

Mapping and statistics
In order to investigate whether it was possible to apply a simple approach to locally adapt the continental data sets of Afsis with a number of available soil analyses, we used the principles of regionalised variable theory described by, for example, Burrough and McDonnell (1989), which assumes that a variable Z at location u can be written as: where m(u) denotes a general trend, R(u) denotes the locally varying deviation from m, i.e. a spatially correlated residual variation from the trend, and e′ is a non-spatial error term. Regression kriging (RK) is one approach that tries to deal with the different parts of Equation 1 (e.g. Odeh et al. 1995). The first step involves regression between Z and one or more covariables. In this case we used the Afsis-predicted SOC content or soil pH, respectively, as the covariable and corresponding real soil analyses as target variable Z. Through this regression equation, m(u) was predicted for all 250 × 250 m 2 raster cells. At all soil sample locations, the residuals (i.e. the differences between the regressed values and the analysed values of SOC content and pH) were computed. These residuals were then interpolated by ordinary block kriging (OK). Kriging is essentially a weighted moving-average technique for estimation whereby weights are selected so that the estimation variance is minimised (Burrough and McDonnell 1989). This gives the most likely value of the attribute variable at a given point or area (block). Ordinary block kriging of the residuals in the present case produced estimates of R(u) with the same spatial coverage and spatial resolution as m(u). As the final step in RK, m(u) and R(u) were added together. Maps of SOC and pH were also produced using OK of soil analyses in the Exhaustive data sets (Exh800, Exh400, Exh200 and Exh100).
All maps were projected into the WGS 84/UTM 36S projection. Administrative sector boundaries in Rwanda were derived from the National Institute of Statistics of . Averages for the administrative sector units were judged as being a suitable working level from an advisory service perspective and also a potentially realistic unit size for use of the Afsis data. Administrative sectors (Imirenge in the Kinyarwanda language) are the third level of administrative subdivision in Rwanda (Figure 1b). They differ in size, but on average cover about 50 km 2 . There are 392 sectors with areas classified as agricultural land in the land-cover database. When the different data sets are compared on the sector level their support is harmonised.
Comparisons between observations and predicted values were done to validate the different mapping methods. The coefficient of determination (r 2 ) and the mean absolute error (MAE), which is a measure of the magnitude of error on average, were used: where ẑ is the predicted value, Z is the observation, and n is the number of observations. All validations were performed Sector averages of RK (validated using Exh800)

Exh100
OK and RK at point locations Figure 2: Schematic overview of the Exhaustive (Exh800), Independent (Ind100) and Afsis data sets and their use in this study. A black arrow towards a data set indicates that it was used for validation, a grey arrow indicates that it was used for prediction. The spatial distribution of the Exh800 and Ind100 data sets is shown in Figure 1b. OK = ordinary kriging, RK = regression kriging against observations not included in the prediction models, as shown in Figure 2. The Ind100 data set was used for validation at point locations of SOC content and soil pH, for maps made by RK and OK from Afsis, Exh800, Exh400, Exh200 and Exh100. The Exh800 data set was used for estimations of 'ground truth' sector averages of SOC and pH, against which predictions with Afsis were validated, individually and by RK with Ind100.
In agricultural advisory work, data are often further simplified into classes, that are generally accepted and easily understood, which form the basis for recommendations to farmers. In this case, we used the classification currently employed for SOC content and pH in this region by the CropNuts laboratory (Nairobi, Kenya). This classification was applied to the sector estimates, and Cohen's linear unweighted Kappa index (Cohen 1960) was used to assess the agreement between sector average predictions from the Exhaustive data set and predictions made directly from the Afsis data set and the combined RK predictions. The Kappa index provides a coefficient of agreement (−1 to 1) corrected for chance, where 0 indicates a random distribution of data and 1 is a perfect match.

Summarising statistics on the point and grid data sets
Summary statistics on SOC content and pH in the Afsis, Exh800 and Ind100 data sets are presented in Table 1. For Afsis, only data for Rwanda are shown, as well as for the 800 pixels of the Afsis data set corresponding to the locations of the exhaustive Exh800 ground truth data set (denoted Afsis800). The average SOC content in agricultural soils of Rwanda according to Afsis was found to be 6 g kg −1 higher than that based on the Exh800 soil data set (31 compared with 25 g C kg −1 ). Afsis consists of a relatively large number of comparatively high SOC values, with almost 28% of the SOC values in Afsis being >40 g C kg −1 , whereas only 3% of the soil samples in Exh800 had such a high SOC content. The overall basic statistics on SOC content according to the entire Afsis and Afsis800 data sets were very similar, as were the statistics for the Exh800 and Ind100 data sets. To some extent, the values for soil pH were the opposite ( Table 1). The mean and median pH values according to Afsis were lower than those based on soil sampling (5.3 in both cases for Afsis800, compared with 5.6 and 5.5, respectively, for the Exh800 data set). Only 1% of 800 sample locations included in Afsis800 had pH higher than 6.0, compared with 30% in the Exh800 data set.
These differences were also apparent in the maps (Figure 3). The variation in SOC in Afsis was considerable (Figure 3a), with visually strong agreement between this map and the elevation map shown in Figure 1a. The map of SOC content produced by OK of 800 soil observations in Figure 3b contains much less variation. The maps appear to be most similar for the provinces of Kigali, East and South. In the West and North provinces, the SOC content is much higher in the Afsis map than in the map of interpolated soil observations ( Figure 3). For pH, there is a zone with low pH values along the mountainous region in the West, South and North provinces (cf. Figures 1a, 3c and d). In the map in Figure 3d, which shows the interpolated pH values of the Exh800 data set, this zone is much narrower and displays distinct gradients towards areas with much higher soil pH in the east and north.

Ordinary kriging and regression kriging validated at point locations
The r 2 and MAE values for maps made through OK (using the Exhaustive data set and its subsets: Exh800, Exh400, Exh200 and Exh100 and validated by Ind100; Figure 2) are shown in Figure 4. Values for maps produced through RK are also shown, but in that case the maps from the Afsis data set were combined with the soil observations in the Exhaustive data set and its subsets. Comparisons between the soil analysis data for the independent validation samples (Ind100) and these maps revealed that Afsis maps of both SOC content and pH were poorly correlated to the data in the validation data set, bearing in mind the slight difference in support between the Afsis data set and the individual samples ( Figure 4; for SOC content: r 2 = 0.05; MAE = 13 g C kg −1 ; for pH: r 2 = 0.11; MAE = 0.7) (MAE is shown above the bars in Figure 4). For SOC, interpolation (OK in Figure 4) using as few as 100 samples (Exh100) yielded an MAE that was half as great (6.5 g C kg −1  a Afsis refers to data in the raster cells in the continental database AfsoilGrids250 b Afsis800 is the AfsoilGrids250 raster cells that correspond to the location of the c exhaustive data set of 800 samples (Exh800) from IFDC d Ind100 is an independent data set of 100 soil samples from IFDC Table 1: Mean, median, different percentiles and standard deviation (SD) of soil organic carbon (SOC) content (expressed in g C kg -1 soil) and pH (H 2 O) in agricultural topsoil in Rwanda according to the AfsoilGrids250 continental soil database and the Exh800 and Ind100 soil sample sets. The spatial distribution of the latter is shown in Figure 1b. Maps of SOC content and pH according to Afsis are shown in Figure 3a and 3c, respectively those 100 samples with the Afsis data, we achieved an even lower MAE (6.1 g C kg −1 ) and a substantially higher r 2 (r 2 = 0.16 for RK of 100 samples with Afsis; r 2 = 0.05 for OK of 100 samples alone). Using more samples in OK or in RK with Afsis data further reduced MAE and elevated r 2 , but for 400 to 800 samples the differences were small. The same pattern was obvious for pH ( Figure 4b), but in that case RK did not improve the outcome compared with interpolating the analyses directly. Nevertheless, the Afsis map of pH was improved with RK when only 100 samples were used (lower MAE and higher r 2 ; Figure 4b). With 200 local pH analyses available, almost as good predictions for the Ind100 validation points could be made with OK as if 800 samples had been used, indicating a stable spatial pattern of soil pH.

Validation of sector average values obtained by regression kriging
Directly estimating SOC content for different administrative sectors using Afsis did not work very well (r 2 = 0.05, MAE = 11.3 g C kg −1 ; Figure 6). The corresponding values for pH were r 2 = 0.33 and MAE = 0.4. Using RK based on the 100 analyses in the Independent data set and the Afsis map reduced the errors and augmented the correlation to the sector estimates made from the Exhaustive data set ( Figure 6). In that case, the SOC content had r 2 = 0.33 and MAE = 4.5 g C kg −1 , whereas pH had r 2 = 0.64 and MAE = 0.2. Maps of sector averages of SOC content and pH based on Afsis and the modification of Afsis achieved through RK are shown in Figure 5c-f.

Validation of agronomic class values for sectors
In advisory work, the values of individual soil properties are often aggregated into classes to allow formulation of clear recommendations. Table 2 shows cross-tables for soil pH classes between sector average pH estimated from the Exhaustive data set (Exh800) and the Afsis data set (Table 2a), as well as the Afsis modified by regression kriging (RK100 -regression kriging of Afsis with the Ind100 data set) (Table 2b). Cohen's Kappa was 0.16 for the Afsis data set, but increased to 0.42 for the RK100 data set. The number of sectors classified into similar classes as when the Exhaustive data set was used increased from 41% with Afsis to 57% with RK100. A similar test of agreement was made for SOC content. However, it was not possible to estimate the Kappa value for Afsis because the agreement was not better than by chance. The Kappa value for SOC content using RK100 was 0.10.

Discussion
Digital soil mapping has revolutionised the manner in which detailed maps of soil properties can be produced. By combining soil reference data with detailed data sets of auxiliary information in predictive modelling, maps of soil properties covering vast areas can be generated. At present there are two detailed soil property maps covering the African continent Vågen et al. 2016). This kind of new full-coverage information is an invaluable resource, describing the spatial variation in structural, mechanical and chemical properties of the soil across continents. It is a goldmine for biophysical modelling and land-use planning and forms the basis for a multitude of other applications (e.g. Miller 2012). In this study we examined the suitability of these continental maps for use at local or regional scale, and tested how maps describing general patterns can be adapted to a finer scale by use of local samples. Our results confirm the claim by Arrouays et al. (2014) that the cell size of a raster map does not reflect its uncertainty. Even when used for estimates of average statistics for agricultural land in the entire country of Rwanda, there was a clear discrepancy between SOC content estimated from the AfsoilGrids250 data and SOC estimated from the observations in the Exhaustive data set ( Table 1). The SOC content was clearly over-predicted by Afsis in comparison with the observed data. Afsis had a rather large number of high SOC content values that were not present in the Exhaustive data set. On the other hand, country-wide statistics for soil pH were fairly similar for the Afsis and the Exhaustive data sets. There was a small difference between the soil depths considered (0-20 cm in the soil samples; 0-15 cm in the predictions from Afsis). Although this may be part of the reason for the observed discrepancies, it is probably not the main cause. Successful modelling in DSM is largely dependent upon the covariables included -they must be adequately related to the soil properties under study -but clearly there must be enough reference samples for data mining methods to produce correctly functioning prediction models.
At the detailed scale, our case study showed that SOC content and soil pH according to the continental AfsoilGrids250 database were not well correlated with soil observations (Figure 4). Moreover, the spatial pattern in maps produced from a large number of soil samples was not particularly similar to that in the Afsis maps ( Figure 3). Soil pH was somewhat better correlated with soil data than was SOC content (Figures 3 and 4).
The intention for AfsoilGrids250  is that as additional soil observations become available, the models can be recalibrated and the maps can be improved. Local users may be tempted to use the maps immediately, but care must be taken when judging the given validation measures, as they are derived based on data from the entire continent. However, by taking soil samples to first validate and then adapt the map (here through RK), we found that it was possible to improve the correlation to real soil observations (Figure 4). In the estimation of administrative sector averages, which is obviously a   more reasonable application of a continental soil property database than comparisons with point locations (although in this case every point sample represented 0.5-1.0 ha), the improvement obtained by applying RK was even more pronounced. The effect of RK is apparent in Figures 5 and  6. Through the use of only 100 soil analyses for the whole of Rwanda, the maps in Figure 5c and d were transformed into the maps in Figure 5e and f, which are noticeably more similar to the maps produced from the Exhaustive data set (Figure 5a and b). The effect is also evident from the validation statistics displayed in Figure 6. While it might not be the map producer's intention for a global or continental soil database to be used for acquisition of soil properties at a single point location, with the development of, for example, smartphone applications that can present information for the user's location, this use of the data is inevitable. As concluded by Minasny and McBratney (2016), the field of DSM is no longer an area exclusive to researchers and has now been taken into operational use. This means the involvement of other stakeholders with different needs for metadata and ancillary information. From a non-experienced user's perspective, or indeed from the perspective of any user who does not penetrate the technical reports of the soil databases available, it may be reasonable to believe that information derived by renowned universities and research organisations can be trusted and applied locally. This is why future digital soil mappers will have to provide the information needed for any user to easily assess the map product. Grunwald (2009) reported that more than one-third of 90 DSM studies included in that review were not validated at all. Although not common, there are some good examples of how to communicate uncertainty. For example, Hengl (2003) overlaid the map with a varyingly transparent white layer, where the degree of transparency was inversely proportional to the uncertainty of the map, and Odgers et al. (2015) presented maps of the uncertainties. SoilGrids1km, the predecessor of AfsoilGrids250m, was provided together with percentile maps of the predictions, as a means to communicate the uncertainty ). However, these reported uncertainties also depend on the manner in which the validations are performed. Defourny et al. (2012) reported that in many global land-cover applications, the quality and accuracy of the land-cover maps used are not considered. Instead, it is up to the potential user to assess whether the map is appropriate for the application. For presumptive users of digital soil maps, we present some basic guidelines for map assessment in Box 1. We argue that in the current situation with multiple sources of global/continental soil information, it is important to provide users with adequate information so that map products can be assessed for each specific application.

Conclusions
Continental data sets produced through DSM should not be applied for regional or local estimates without any reference samples with which to compare. High spatial resolution in a continental data set can be misleading; it is normally only the framework upon which the predictions are made, rather than the resolution of potential applications. In order to promote accurate use (or rather prevent inadvertent misuse) of published soil data, the DSM community must help users assess whether the map data are appropriate for their intended use. If a large-extent map is found to be too coarse for a specific application (e.g. regional fertiliser recommendations), it may be possible to improve it by, for example, regression kriging, if a number of local soil observations are available. In this study, the MAE for sector averages of SOC in Rwanda were reduced from Table 2: Cross-tables showing the agreement between pH classes of sectors created from 800 soil samples in the Exhaustive data set (Exh800) and maps produced by (a) Afsis and (b) regression kriging of Afsis data with the 100 samples in the Ind100 data set (RK100) : Coefficient of determination (r 2 ) and mean absolute error (the numbers shown above the bars) of SOC content (g C kg −1 ) and pH estimated for different administrative sectors of Rwanda with Afsis data and RK100, i.e. Afsis recalculated with regression kriging (RK) using 100 soil analyses in the Ind100 data set. Comparisons were made with sector averages estimated by ordinary kriging of the Exhaustive data set (800 soil analyses). Maps are shown in Figure 5a-f 11.3 g C kg −1 if only the continental data set was used, to 4.5 g C kg −1 when only 100 national soil observations were combined with the continental data set by regression kriging (corresponding figures for pH: 0.4 which were reduced to 0.2). We recommend further studies on approaches for local improvement of global and continental data sets and call for innovative ideas on how map uncertainties can be made accessible and understandable to general users.