The use of multisource spatial data for determining the proliferation of stingless bees in Kenya

ABSTRACT Stingless/meliponine bees are eusocial insects whose polylactic nature enables interaction with a wide variety of wild plants and crops that enhance pollination and, hence, support ecosystem services. However, their true potential regarding pollination services and honey production is yet to be fully recognized. Worldwide, there are over 800 species of meliponine bees, with over 20 species documented on the African continent. Out of these, only 12 species have been well documented in Kenya. Moreover, interest on meliponine bees has increased amid climate change, agricultural intensification, and other anthropogenic effects. Generally, stingless bees are under-researched, with no previous documented evidence of their ecological niche (EN) distribution in most African countries. Hence, this study sought to establish the influence of bioclimatic, topographic, and vegetation phenology on their spatial distribution and change patterns. Stingless response variables from 490 sample points were collected and used in conjunction with 11 non-conflating features to build stingless ecological niche models. Six machine learning-based EN models were used to predict the distribution of seven stingless bees’ species combined. The results from the EN models showed that annual precipitation was the most influential variable to stingless bee distribution (contributing 43.09% logit), while potential evapotranspiration and temperature seasonality contributed 21.18% of the information needed to predict the spatial distribution of stingless bees. Vegetation phenology (21.36%) and topography (14.36%) had moderate effect on stingless bees’ distribution. On the other hand, high seasonality in precipitation and temperature indicated high stingless niche variability in the future (i.e. 2055). The performance of six EN algorithms used to predict distribution of stingless bees was found to be “excellent” for random forest (true skills statistics (TSS) = 0.91) and ranger (TSS = 0.90) and “good” for generalized additive models (TSS = 0.87), multivariate adaptive regression spline (TSS = 0.80), and boosted regression trees (TSS = 0.80), while they were “fair” for recursive portioning and regression trees (TSS = 0.79). These EN models could be utilized to inform stingless bee farming and insects pollinated crops by highlighting regions that provide highly suitable conditions for stingless bees. Additionally, the findings could be harnessed to increase both bee and agricultural productivity and forest conservation efforts through supplementary pollination services.


Introduction
Stingless bees, also known as meliponine bees, are eusocial insects that comprised hundreds of highly diverse and abundant species (Michener 2013;Iraheta et al. 2015). They are widely spread, with over 800 documented species worldwide that differ in their body morphology, colony size, and behavior. There are six genera of stingless bees found in Africa: Hypotrigona, Cleptotrigona, Liotrigona, Plebeina, Dactylurina, and Meliponula (Eardley and Kwapong 2013;Michener 2013), with over 20 species documented and endemic to the African continent and 12 species documented in Kenya (Michener 2000;Eardley 2004). The polylactic nature of meliponine bees is of particular importance with regard to pollination services they provide to numerous crops and wild plants (Slaa et al. 2006;Njoya 2010). They are known to supplement honeybees (Apis mellifera), which have over the years been the major pollinators of wild plants and crops (Winfree et al. 2007;Giannini et al. 2015;Garantonakis et al. 2016). Generally, honeybee colonies are on the decline due to colony collapse disorder occasioned by the effects of climate change, agricultural intensification, use of pesticides, and bee diseases and pests Pirk et al. 2015;Makori et al. 2017). This necessitates for inclusion of other pollinators such as stingless bees to complement honeybee pollination services. Indeed, it has been demonstrated that stingless bees improve the quality of fruits and seed set in crops (Slaa et al. 2006;Kiatoko et al. 2014Kiatoko et al. , 2022; hence, the interest in meliponiculture has substantially grown around the African continent. Moreover, livelihoods of many rural African communities have improved due to the production of high-quality meliponine bee honey and other beekeeping products (Cortopassi-Laurino et al. 2006;Kiatoko et al. 2014) used as food, medicine, and a source of income. Stingless bee honey has antibiotic and anti-carcinogenic properties (Libério et al. 2009) due to the enzymes associated with these properties (Bijlsma et al. 2006;Mokaya et al. 2021). The medicinal value of meliponine bee honey translates into higher price in the market compared to regular honey (Kumar et al. 2012). Also stingless bee cerumen, propolis, and pollen contain antioxidants, immunomodulatory effect, antibacterial, and antiinflammatory properties that are important for medical purposes (Libério et al. 2009;Araujo et al. 2012).
Generally, the spatial and temporal distribution of stingless bees in Kenya and the African continent at large has not been well explored. The environmental and anthropogenic factors and conditions necessary for the proliferation of stingless bees remain largely undocumented (Fierro et al. 2012;Iraheta et al. 2015). These environmental conditions include ambient temperature, precipitation, humidity, altitude, soil wetness, and solar irradiance and other factors such as shading (Ackerly et al. 2010;Iraheta et al. 2015). Furthermore, the seasonal pattern of vegetation growth influences stingless bees population dynamics (Miller et al. 2009;Williams, Regetz, and Kremen 2012), as their forage circles are within short distances (250-500 m) of their nesting sites (Grundel et al. 2010;Torné-Noguera et al. 2014). Since food resources significantly influence their spatial distribution, stingless bees are more sensitive to changes in local environmental conditions compared to honey and bumble bees (Torné-Noguera et al. 2014). Additionally, stingless bee species respond differently to food resources whose distribution is based on local environmental conditions (Roulston and Goodell 2011;Fierro et al. 2012). Specifically, floral community composition and distribution play a major role in spatial dispersion of stingless bees than other factors such as nesting substrates (Torné-Noguera et al. 2014).
To understand the role of biotic and abiotic factor variations on stingless bees distribution, artificial intelligence (AI), machine learning (ML), and ecological niche (EN) models are commonly used to provide a statistical pathway that influence their proliferation in space and time (Kearney and Porter 2009;Fernández, Hamilton, and Liu 2015). On the other hand, accurate data on location and temporal distribution of stingless bee species can provide an important preview for describing their habitats using reliable EN and species-specific predictions modeling approaches. Therefore, response and predictor variables used in these models should be carefully selected to increase prediction accuracy (Phillips and Dudík 2008;Peterson 2011). In this context, overfitting of EN models should be avoided to achieve reliable prediction of species' spread (Cord et al. 2014). Also, remotely sensed predictor variables such as vegetation phenology should be incorporated into prediction models to minimize probable biasness brought about by generalist nature of bioclimatic variables. This is because bioclimatic predictor variables are commonly interpolated from discrete observations while vegetation seasonality variables are continuous observations whose resolutions are higher both in the temporal and spatial standpoints (Pau, Gillespie, and Wolkovich 2012;Cord et al. 2014).
Although many studies (e.g. Njoya 2010;Fierro et al. 2012;Kumar et al. 2012;Nkoba et al. 2012;Kiatoko et al. 2014;Torné-Noguera et al. 2014;Iraheta et al. 2015;Ndungu et al. 2019) have documented the importance of biology, nesting patterns, and distribution of stingless bees, exploitation of vegetation phenology and its interactions with other biotic and abiotic factors to determine the distribution of stingless bees in Kenya is largely unavailable. Furthermore, vegetation phenology provide detailed vegetation information that is derived in narrow and accurate patterns, which could be selected using variable selection procedures in EN models to reduce parameterization and overfitting in prediction, improving model accuracy (Cord et al. 2014;Naimi and Araújo 2016). In addition, since future bioclimatic conditions are already modeled and readily available, EN models can offer tools to predict possible change scenarios, hence providing informed decisions to avert unwarranted future climatic effects on stingless bee proliferation and biodiversity loss. Therefore, this study sought to establish the influence of bioclimatic, topographic, and vegetation seasonality variables on the spatial proliferation of stingless bees in Kenya. This study also modeled the future distribution scenarios and patterns of stingless bees using simulated bioclimatic data.

Study site
The study area spans four agroecological zones with a representative climatic gradient comprising 40 administrative counties in Kenya. The area covers the coastal, eastern, Mount Kenya, Nyanza, Rift Valley, and western regions of the country, totaling 256,638.49 square kilometers (Figure 1). The stingless bee occurrence data were collected from 19 counties, i.e. Kilifi, Mombasa, Taita Taveta, and Tana River (in the coastal region), Kitui, Machakos, and Makueni (in the eastern region), Baringo, Bomet, Nandi, Narok, and Elgeyo Marakwet (in the Rift Valley region), Kisii and Nyamira (in the Nyanza region), Meru (in Mount Kenya region), and Kakamega, Bungoma, Vihiga, and Busia (in the western region). The western region, including Nandi Hills (in Rift Valley region), Kisii (in Nyanza region), and Mount (Mt.) Kenya regions are characterized by high humidity, with relatively low temperature and high altitude (Githui et al. 2009). They comprise major natural forest and water towers such as the Mau Complex, Cherangani Hills, Kakamega Forest, and Kisii Highlands (Kinyanjui 2011;Odawa and Seo 2019). The coastal region, a low altitude area, experiences high humidity and temperature, with one major indigenous forest (Arabuko Sokoke), which is part of the coastal Kenya and Tanzanian forests arc (Schürmann et al. 2020). Natural vegetation in the coastal region is under threat from agricultural intensification and expansion of human settlement Figure 1. The study area indicating extent of the modeled region (within the blue outline) in relation to the national Kenyan boundary (in the inset map) in an agroclimatic zone (ACZ) backdrop, ranging from hot and dry (zones 70, 60, and 50) to cold and wet (zones 40, 30, and 20) regions. Stingless bee occurrence data (black points) span across four agroecological zones encompassing different climatic gradients in Kenya. (Habel et al. 2017). The eastern region, Tana River (in the coastal region), and Baringo (in the Rift Valley region) receive relatively lower rainfall than other regions, with relatively high temperature during the day. Vegetation is predominantly shrubs with isolated stands of acacia trees (Speranza et al. 2009). Elevation of the study site ranges from 2 m above sea level at the coastal region to 2,045 m above sea level in the Nandi Hills (in the Rift Valley region).
Mt. Kenya, Kisii highlands (in Nyanza region), and Western regions are predominantly agricultural areas with relatively high maize productivity, a staple food crop in Kenya. In addition, stingless bees are used to improve crop productivity both in the field and controlled greenhouse environments (Kiatoko et al. 2022). Small scale subsistence farming is practiced in the coastal and eastern regions due to the relatively low precipitation (Githui et al. 2009;Schürmann et al. 2020).

Data collection and pre-processing
The datasets used in this study were divided into response (stingless bee occurrence) and predictor (bioclimatic, topographic, and vegetation phenology) variables (Table 1). The occurrence data were obtained in a point vector format while the predictor Table 1. Bioclimatic, topographic, and vegetation phenology variables used in the prediction of the spatial distribution models of stingless bees. The variables in bold were the selected uncorrelated ones used in the final stingless bees' distribution models. variables were satellite raster images. All the predictor variables were clipped to the boundaries of the study site. Since the predictor variables were in different spatial resolutions, hence the bioclimatic and topographic data were resampled to a pixel resolution of 250 m using the Quantum GIS software (QGIS Development Team 2021). The warp option under the raster processing tool in QGIS was used to edit the resolution of the bioclimatic and topographic variables to fit the resolution of the vegetation phenology variables with intermediate resolution in the predictor variables. This is based on the recommendation by Mudereri et al. (2020a) that predictor variables of different spatial resolution should be harmonized and resampled to the same resolution, preferably to match the resolution of one of the predictors.

Response variables
As aforementioned, stingless bees' occurrence data were collected from 19 counties in Kenya during the dry (December to February and June to August) and wet (March to May and September to November) seasons, between January 2016 and December 2020 ( Figure 1). In each county, meliponaries and colonies observed with the help of the local community within the study area were randomly sampled, with their locations and all species of the stingless bees enumerated. Unidentified stingless bee species were also recorded. Stingless bee presence only data were used for this study since it was logistically impractical to collect "actual absence" data from the field. Furthermore, lack of sighting during this study was not guarantee of absence (Jiménez-Valverde et al. 2011;Yackulic et al. 2013). It was assumed that the stingless bees had the same probability of occurrence across the landscape; hence, every pixel had the same probability of being tagged as "background" pseudoabsence across the environmental and geographic space of the study area (Naimi and Araújo 2016). Moreover, the species diversity modeling (sdm) package (Naimi and Araújo 2016) in R software (R. Core Team 2020) used in this study generates and uses the "pseudo-absence" observations data to compensate for lack of "actual absence" data (Guan et al. 2020). Data were collected from a total of 490 sample points distributed along the coastal, eastern, Kisii, Mt. Kenya, Rift Valley, and the Western regions of the country (Figure 1). Specifically, the counties where stingless bee data were collected span across four main agroecological zones, which had a representative climatic gradient. Data on six stingless bee species were randomly collected whereby a meliponary with more than one domesticated colony placed in one location or a wild nest with a colony was used as an observation/sample point. These were Hypotrigona gribodoi (n = 64), Meliponula togoensis (n = 24), Meliponula bocandei (n = 23), Meliponula ferruginea (n = 55), Plebiana armata (n = 44), and unidentified species (n = 280 sample points). Locations of the stingless bees were recorded using a handheld global positioning system (GPS) with an accuracy of 3 m (±3).

Predictor variables 2.2.2.1 Vegetation phenology.
The 16-day enhanced vegetation index (EVI) observations acquired by the Moderate Resolution Imaging Spectroradiometer (MODIS) at 250-m spatial resolution was used to derive vegetation phenological variables (Table 1). EVI over 20-year period (from 2001 to 2020) was processed using TIMESAT software (Eklundha and Jönsson 2017) to derive 13 vegetation phenological variables (Table 1) for the two known rainy seasons. To achieve best fitting vegetation phenological parameters, TIMESAT settings were configured and set following Makori et al. (2017) recommendation.
A total of 13 vegetation phenological variables (Table 1) were produced for each of the two rainy seasons in the study area. However, since the second season in Kenya is not stable and vegetation phenological indicators are not consistent throughout the years, only phenological parameters from the first season were used in the present study. Start of season (start_t) was calculated from the left edge minimum using a 20% EVI deviation, while time for the end of season (end_t) was calculated using a 20% EVI deviation threshold from the minimum EVI, detected toward the latter part of the season. The length of season (length) measures the time from the start to the end of the growing season, and the base level value (base) describes the average value of the right and left minimum values of the vegetation in the season. On the other hand, middle of the season (mid) is the mean of the time of the season after removing 20% of the minimum from both the right and left vegetation productivity indicated by EVI, and the maximum value (max) is the largest vegetation (EVI) value within the season. Amplitude (amplitude) gives the difference between the base value and the largest vegetation (EVI) value in the season, left derivative (left_d) shows how fast the season is increasing from the left, right derivative (right_d) measures the rate of decrease from the right, and large integral (large_i) describes the season from the start to the end. The small integral (small_i) variable is used to describe the difference between the season end and base level, whereas the value of the function at the start of season (start_v) and the value of the function at end of season (end_v) indicate the vegetation (EVI) values at the start and end of the season, respectively.

Topographical variables.
To investigate the influence and effect of landscape morphology on the distribution of stingless bees, topographical variables were included in the EN models. Aspect, hillshade, roughness, slope, topographic position index (TPI), and the terrain ruggedness index (TRI) ( Table 1) were derived from a 3 arcsec resolution (~90 m pixel resolution) digital elevation model (DEM) data acquired by the Shuttle Radar Topography Mission (SRTM) instrument (CGIAR-CSI 2020), and used together with other variables to predict the distribution of stingless bees within the study sites.

Bioclimatic variables.
To determine the contribution of climatic conditions on the distribution of stingless bees, the freely available bioclimatic variables with 1 km spatial resolution (Table 1) from AfriClim (Platts, Omeny, and Marchant 2015;Fick and Hijmans 2017) were utilized in this study, i.e. the AfriClim data were downscaled from the WoldClim platform (www.worldclim.org). These bioclimatic variables contain grids of temperature, rainfall, and derived climatic summaries. The variables describe a long-term mean of the current conditions (1970 to 2000) and simulated future conditions in 2055. This study utilized simulated climate data under the intermediate CO 2 emissions, representative concentration pathway scenario (RCPs) 4.5 watt/m 2 (Pachauri and Mayer 2015) as set by the International Panel on Climate Change (IPCC) using total radioactive forcing values. The simulated future bioclimatic data were calculated as a mean over the years 2041 to 2070 (Guan et al. 2020) and obtained from the fourth version of the community climate system (CCSM4), which has the most reliable climate projections (Mohammadi et al. 2019;Mudereri et al. 2020b). For each of the timesteps, 21 bioclimatic variables (Table 1) were utilized. Specifically, 10 temperature and 11 precipitation variables were generated for each timestep.

Variable selection
Multicollinearity among predictor variables is the single major problem that leads to instability and volatility in the model's parameterization and performance (Dormann et al. 2013;Naimi and Araújo 2016). Before testing for multicollinearity, the recursive feature elimination criteria in the "caret" package in R was employed to assess the possible minimum number of predictor variables that could produce comparable results. This analysis enabled the study to gain insight into the number of uncorrelated predictor variables that could give the most accurate modeling results. Thereafter, a two-stage elimination criteria was performed based on the Pearson correlation coefficient and variance inflation factor (VIF) to select the predictor variables used in the EN models. The aim of the variable selection experiment was to reduce multicollinearity among predictor variables and establish relevant orthogonal features that are most suited for predicting stingless bee distribution using EN models (Plant 2012;Dormann et al. 2013). To detect collinearity among predictor variables, multiple regression models were utilized. Each predictor variable was regressed against all other variables, and the VIF was computed for each combination (Plant 2012). The VIF is commonly used to first select important variables among the different categories, and then iteratively those with high linear regression coefficients are eliminated.
This study first set the threshold for the Pearson correlation coefficient at th = 0.7 (r ≥ 0.7), which has been demonstrated to yield the best result without any significant loss of information (Plant 2012;Dormann et al. 2013;Araújo et al. 2019). Second, the "vifstep" function available in the "usdm" package in R (Naimi and Araújo 2016) was used to further assess the collinearity of the selected variables based on a recommendation by Dormann et al. (2013) that VIF values of 10 or higher between variables indicate collinearity. The correlation matrix presented in Figure 2 indicated that some of the essential and relevant variables to the stingless bees' distribution such as isothermality and mean annual rainfall were highly correlated. Therefore, selection of such variables was based on their importance and ecological significance in species distribution models using expert-knowledge approach (Dormann et al. 2013;Naimi et al. 2014). The variable selection experiment enabled the study to select 11 relevant and uncorrelated variables (Table 1) out of a total of 40 (21 bioclimatic, 6 topographic, and 13 vegetation phenology). Using a good number of relevant response variables would lead to acceptable and reasonable prediction accuracies in EN models (Stockwell and Peterson 2002;Wisz et al. 2008). It has been established that 90% prediction accuracy of machine learning algorithms can be obtained using as low as 10 response variables, but higher number of variables, from 50 sample points, would ultimately yield better or near maximal accuracy (Wisz et al. 2008;Araújo et al. 2019). Given that the selected response variables consisted of more than 10 sample points for each stingless bee species, it was expected that the EN models developed in this study would yield "acceptably" high prediction accuracies.

Model fitting
Machine learning EN models in the "sdm" package (Naimi and Araújo 2016) in the R software (R. Core Team 2020) were used to relate the stingless bee presence-only observations to the selected uncorrelated predictor variables. Since this study utilized  Table 1 for more details on the variables) used in the ecological niche (EN) models for the spatial distribution of stingless bees in Kenya. Blue color indicates positive correlation while red color indicates negative correlation. Darker shades of red and blue indicate high collinearity among predictor variables, while lighter shades indicate low collinearity. Additionally, the size of the circles indicates the extent of correlation. Moreover, the orientation of the concave denotes the direction of the correlation between variables, with those facing left signifying negative correlation.
presence-only observations, 1,000 pseudo-absence points were generated in R environment using the "sdmdata" function in the "sdm" package.
Using a good number of response variables to obtain acceptable prediction accuracies is paramount in EN modeling (Stockwell and Peterson 2002;Wisz et al. 2008). As aforementioned, it has been established that 90% prediction accuracies for machine learning algorithms can be obtained using as low as 10 response variable, but higher number (from 50 sample points) would ultimately yield better or near maximal accuracies (Wisz et al. 2008;Araújo et al. 2019). Given that the response variable consisted of over 10 sample points for each species, it was predicted that the EN models developed in this study would yield "acceptable" accuracies.
The "sdm" package has a selection of over 22 EN model executions, which are object oriented, reproducible, and have an extensible framework in R (Naimi and Araújo 2016). These machine learning models employ somewhat the same principles of presencepseudoabsence response variables. In the present study, trial runs were performed for all 22 EN modeling algorithms, from which the best six were selected based on their prediction accuracy ( Table 2). The six selected algorithms have been demonstrated to yield high performance accuracies in predicting species distribution in complex environments (Naimi and Araújo 2016;Mudereri et al. 2020a). An ensemble projection in the "sdm" package inherent in "R" software was employed to estimate the mean predictions of stingless bee distribution over the six selected EN modeling methods based on the ranking of their predicted accuracy.
It was hypothesized that the model accuracy for six algorithms used in this study could differ in predicting some stingless bee instances; hence, an ensemble modeling approach was employed to harmonize the variations among the six EN algorithms. Furthermore, previous studies have demonstrated that ensemble modeling approach usually has a superior predictive power than individual modeling methods (Naimi and Araújo 2016;Araújo et al. 2019;Hao et al. 2019;Makaya et al. 2019). An ensemble approach fits and maximizes model accuracy resulting in improved reliability, since it utilizes highly ranked models with higher precision.
To run the ensemble model, the study utilized the "ensemble" function in the "sdm" package under the "R" environment (Araújo and New 2007;Naimi and Araújo 2016). The results of the six stingless bee models were harmonized under the ensemble model using the true skill statistics (TSS) weighted average approach, which improves the model predictability (Naimi and Araújo 2016). However, including individual models of low accuracy in an ensemble experiment would yield a low predictive power. Therefore, the study included only EN models of a TSS of 0.7 in the ensemble modeling experiment (Dormann et al. 2013). A TSS is an accuracy measure that estimates the agreement between the response (e.g. stingless bees presence observations) and predictor (e.g. bioclimatic, topographic and vegetation phenology) variables. It ranges between −1 and 1, with 1 indicating high predictive accuracy (Hao et al. 2019).
Predictions for the distribution of stingless bees in this study area were done under both current and future climatic projections in 2055. For consistency, similar model settings and packages were used for both prediction scenarios to establish the effect of climate change on the distribution of stingless bees in Kenya. For consistency and comparability, the future stingless bee prediction scenario utilized simulated climatic variables, selected to match predictors used in the current climatic scenario. On the other hand, the current vegetation phenological and elevation variables were used in the future model, as projected vegetation phenological scenarios are absent, and elevation is unlikely to considerably change in future. In this study, it was assumed that elevation and vegetation phenology will remain unchanged over the projected future period in 2055. The Table 2. Selected ecological niche modeling methods used in parallel executions in the "sdm" package in the "R" environment to predict the spatial distribution of stingless bees in Kenya.

Stingless bees' distribution model validation
This study employed random data splitting method inbuilt in the "sdm" package to assess the accuracy of the stingless bee distribution models developed in the current study. In both the current and future cases, a 10-fold cross validation approach was used (Naimi and Araújo 2016;Araújo et al. 2019), and automation was used to independently and randomly draw a 70% sample for calibration of the SDMs while the remaining 30% were used to validate their performance. The study utilized both threshold-independent statistics (i.e. area under curve (AUC) of the receiver operating curves (ROC)) and threshold-dependent statistics (i.e. true skills statistics (TSS), sensitivity, and specificity) to assess the performance of the stingless bee EN models (Phillips and Dudík 2008). The ROC is a graphical representation of the model's fitting of sensitivity (presence data points) and specificity (pseudo-absence data points) metrics, while the area under such graphical representation (AUC) indicates the overall accuracy of model performance. AUC values close to zero (0) indicate poor ordering or impossible occurrence while those close to one (1) represent perfect prediction by the model or optimal occurrence (Du et al. 2014). The study modified the Swets discriminatory power (Swets 1988;Mohammadi et al. 2019), which ranks the models' performance in a way that (i) excellent ≥ 0.90, (ii) good = 0.80-0.89, (iii) fair = 0.70-0.79, (iv) poor = 0.60-0.69, and (v) fail ≤ 0.59. As previously mentioned, ensemble modeling utilizes strengths from the individual models while minimizing their weaknesses (Araújo and New 2007;Araújo et al. 2019); hence, the study employed TSS to test the accuracy of the current and future ensemble models. Equations 1 to 3 describe the TSS variables and their parameters (Richard et al. 2018). Optimal or perfect occurrence or perfect ordering of observed stingless bee presence versus its predicted occurrence were presented by TSS values close to +1, while values lower than zero (TSS ≤ 0) represented poor or impossible stingless bee occurrence (Guan et al. 2020). The current and future stingless bee distribution maps were used to calculate the predicted area covered by stingless bees and to quantify change in their probability of occurrence. To quantify change in area, the probability of occurrence was converted to presence/absence using the evaluate function in the "sdm" package in R. The presence/absence observation data were then used to extract probability of occurrence in the "raster" package by thresholding maximum TSS values. This was eventually converted to absolute data frame change estimates whose areas were calculated and quantified in the "sdm" package in R. The change in probability of stingless bees' occurrence was grouped into the following classes: (i) high probability of reduction ( where sensitivity represents the presence observation and specificity is absence.
where a is true positive presence and b is false negative presence.
where c is false positive presence and d is true negative presence.

Variable interaction and EN model development
The collinearity model ( Figure 2) indicated that 29 out of 40 variables used in the current prediction models were conflating. These variables were eliminated from the EN models based on their importance and individual interactions as indicated in Figure 3. Hence, 11 non-conflating variables (highlighted in bold in Table 1) were deemed fit for developing the final EN models. Furthermore, the recursive feature elimination (RFE) criteria indicated that a few variables (14) could give comparable results before eliminating the collinear ones (Figure 4). In addition, selected bioclimatic variables had the highest relative influence (64.27%) on  Table 1 for more details on the variables) used in the current EN models for predicting the distribution of stingless bees. Mean annual rainfall (bio12), isothermality (bio3), slope, and temperature seasonality (bio4) poised as the most relevant before the highly correlated variables were eliminated (see Table 1 for full variables list). the EN models developed to predict the stingless bee occurrence, while vegetation phenological and topographical predictor variables contributed to 21.36% and 14.36% logit, respectively ( Figure 5). Additionally, the results showed that among the three selected bioclimatic variables, mean annual rainfall (bio12) was the most important variable in the EN models. It contributed almost half (43.09%) of the information required for distribution of stingless bees ( Figure 5). Temperature seasonality (bio4) was the second important bioclimatic variable, followed by potential evapotranspiration (pet), which together added 21.18% of the information needed to predict stingless bee distribution ( Figure 5). On the other hand, vegetation phenology provided the greatest number (5) of variables viz, integral from season start to end (large_i), value at the end of the season (end_v), difference between maximum and base level (amplitude), rate of increase at the beginning of the season (left_d), and time for the start of the season (start_t). However, they ranked among the least important variables for predicting the distribution of the stingless bees, except large_i, which contributed about 8.98% in the EN models ( Figure 5). For topographical variables, slope and roughness were the most important with slope having relatively higher comparative influence (7.18%) to the EN models than the other topology variables. Even though vegetation phenology and topography contributed the least logit when combined (35.73%), they both helped to sharpen the EN models further by providing more prediction information at grain level, hence improving the model's accuracy.

Stingless bees' prediction models and validation
The AUC threshold-independent statistics indicated that all the EN models developed in this study were "excellent" in their performance according  Table 1 for more details on the variables) used in the ecological niche models. Mean annual rainfall (bio12), temperature seasonality (bio4), and potential evapotranspiration (pet) were poised as among the most important variables.
to the modified Swets discriminatory power (Table 3). On the other hand, the TSS thresholddependent statistics denoted that the EN models achieved varied results ranging from "fair" to "excellent" ranking. In both cases, random forest and ranger algorithms achieved "excellent" results for both the AUC (0.98) and TSS (0.91). Additionally, the differences between attained AUC and TSS model accuracies ranged between 7% (0.07) for random forest and 15% (0.15) for MARS and BRT (Table 3). This was an indication that the latter two EN algorithms had higher disparities in threshold-dependent and -independent evaluation criteria than the rest. The ROC curves ( Figure 6) for all model algorithms demonstrated a good fit between training and test datasets for smoothed means, with little variations between individual 10-fold replicated runs. Random forest and RANGER displayed better fitting for 10-fold replicates than other algorithms, indicating model stability and relatively consistent predictions within replicates. RPART had the highest dispersions of AUC for individual replicates, suggesting relative model instability before mean averaging. Moreover, some individual replicate runs were demonstrated to be closer to the one-to-one Figure 6. Receiver operating curves (ROC) of the six EN modeling methods used to predict distribution of stingless bees. The red and blue curves indicate smoothed mean area under curve (AUC) of the training data and test data, respectively, with 10-fold cross validation sampling. Light blue/cyan curves indicate 10-fold replicate runs for individual models with training data. The black dotted line shows the one-to-one mid-point interaction between sensitivity and specificity for each model.  (Table 3). Ultimately, there was no algorithm skewness beyond the one-to-one midpoint of ordering for specificity and sensitivity. The ordering of presence and absence data (Figure 7) also indicated robustness of the response variables used to develop prediction models. Moreover, presence-absence data of response variables presented limited dispersion from the mean, indicating good separability between classes and high integrity within classes.

Stingless bees' ecological niche prediction
The EN models predicted high stingless bee suitability in the western, Kisii (  while large parts of Tana River (in the coastal region) and Narok (in the Rift Valley region) counties had low predicted probability (Figure 8). These regions have varied temperature and precipitation gradients as well as seasonality. Additionally, their characteristic climatic and agroecological gradients are diverse (Bryan et al. 2010;Camberlin et al. 2014;Ochieng, Kirimi, and Mathenge et al. 2016). The coastal region has high temperature and humidity, while the eastern parts of the country have high temperature and relatively lower precipitation. Kakamega (in the western region), Kisii (in the Nyanza region), Nandi hills (in the Rift Valley region), and the Mt. Kenya regions experience cooler temperatures with elevated levels of precipitation. Indeed, mean annual rainfall and temperature seasonality (bio12 and bio4) contributed significantly to the EN distribution models. Furthermore, topography of parts predicted with high suitability was similar, with high elevation values that are consistent with low temperature and high precipitation.
Mombasa and Kilifi (in the coastal region) and Kitui (in the eastern region) counties are lower altitude regions, while Kakamega and Busia (in the western region), Kisii (in the Nyanza region), Nandi (in the Rift Valley region), and Taita hills (in the coastal region) are largely hilly. The future map (b) displays increased suitability for stingless bees in some areas, while low and moderate suitability was predicted for areas such as Nandi hills (in the Rift Valley region), Mt. Kenya, and the eastern regions of Kenya. Conversely, most regions predicted with high stingless bee habitat suitability will be reduced in 2055. There was an exception in the Kisii highlands (in the Nyanza region), which indicated increased suitability from moderate to high suitability of proliferation. The western and eastern (specifically Kasanga in Mwingi) regions had reduced from high to moderate suitability of proliferation.
The change in map based on probability of stingless bee occurrence (Figure 9) revealed that many areas especially Kisii (in Nyanza region), Mt. Elgon and Nandi hills (in the Rift Valley region), Mt. Kenya, and some parts of Kitui (in the eastern region) had increased suitability while Kakamega and Busia (in the western region), Bomet (in Rift Valley region), Mwingi (in the eastern region), and Taita Taveta, Mombasa, and Kilifi counties (in the coastal region) had moderate-to-high decrease in probability of stingless bee occurrence. Moreover, the categorized change classes indicated that probability of stingless bees' proliferation increased in regions totaling 35,958.68 km 2 , while a total of 99,363.70 km 2 was predicted to have diminishing probability of occurrence (Table 4) with unaltered predicted future climate change pathways. Most interclass variations were centered between moderate and low diminishing probability to low and moderate probability of proliferation. Overall, there was more probability of diminishing than proliferation by a total of 63,405.02 km 2 in the study area.

Discussion
Stingless bees are important in fragmented landscapes due to their polylactic nature. Specifically, their wide range of flower preference make them valuable pollinators, especially in areas with low plant density and diversity. Amid climate change, the influence of climate on spatial proliferation of stingless bees is an important element in the determination of their future spatial distribution.

Variable interaction and EN model development
Despite stingless bees' diverse and generalist foraging ability, climate and vegetation phenology influence their proliferation (Slaa et al. 2006;Njoya 2010). The EN models established that precipitation, particularly mean annual rainfall, was the most influential variable determining the distribution and proliferation of stingless bees. This variable provided almost half (43%) of the information required by the models to predict their spatial distribution. The models predicted more stingless bee suitability in regions with high annual precipitation across the country. Furthermore, it was established that variability in precipitation affected stingless bees' distribution across temporal and spatial dimensions. This is consistent with literature (Fierro et al. 2012;Lovett 2015;Mwalusepo et al. 2015;Platts, Omeny, and Marchant 2015;Makori et al. 2017) that note a positive correlation between precipitation and the proliferation of stingless bees.
The recursive feature elimination (RFE) model (Granitto et al. 2006;Yan and Zhang 2015;Darst, Malecki, and Engelman 2018) indicated that more than half of the predictor variables initially poised for the stingless bee distribution models were redundant. The recursive feature elimination (RFE) experiment established that a few selected features would otherwise yield same or higher prediction accuracy as opposed to using all the 40 predictor variables. The RFE, collinearity, and variable importance procedures were useful in eliminating redundant features and enhanced the stingless bee EN model, reducing instability, volatility, and parameterization (Dormann et al. 2013;Naimi and Araújo 2016). On the other hand, vegetation phenological variables contributed lower permutation on the prediction of stingless bees' distribution compared to bioclimatic variables but were palpably greatly valued, as demonstrated by the number of features tagged important ( Figure 5). It was speculated that the availability of pollen and nectar determines niche colonization of stingless bees and facilitate their proliferation in vegetated areas that could be less suitable in terms of climate. This is more important for providing foraged bee bread and nectar products that are indicators of colony strength and trigger of brooding (Torto et al. 2010;Pau, Gillespie, and Wolkovich 2012). Indeed, vegetation's large integral (large_i) and end-of-season values (end_v), which indicate the amount of pollen and nectar available for stingless bees within and end of the season, respectively, were tagged as the most important in this category. Higher large integral values (large_i) indicate abundance of forage within the season and longer foraging periods, which encourage stingless bee colony growth and proliferation. On the other hand, a large value at the end of Table 4. Probability of change for stingless bees' distribution maps between the current and future predicted areas. The classes indicate areas in square kilometers (km 2 ) predicted to have diminishing and proliferation probability of change.

S/No
Class Area (km 2 ) 1 High probability of reduction 931.25 2 Moderate probability of reduction 14,449.04 3 Low probability of reduction 83,983.40 4 No change 121,316.12 5 Low probability of proliferation 33,501.14 6 Moderate probability of proliferation 2,234.14 7 High probability of proliferation 223.40 Total 256,638.49 the season (end_v) demonstrates availability of enough foraging materials for bees within the season. The coastal, Mt. Kenya, Kisii (in the Nyanza region), Nandi hills (in the Rift Valley region), and western regions had high vegetation's large integral values within the season because vegetation seasons start earlier and the vegetation phenology values at the end of the season were more than other regions. Consequently, these regions were predicted as highly suitable to stingless bees due to these conducive vegetation conditions. Slope and terrain roughness topographic variables contributed substantially (13.64% of the total logit) to the performance of the stingless bee EN models compared to the total contribution (14.36%) of the selected topographical variables ( Figure 5 and Appendix S1). Specifically, suitable ENs spanned through regions with high altitudinal gradients across the study site (Appendix S2). Previous studies (Mani 2013;Jackson et al. 2018;Birrell et al. 2020;Kimathi et al. 2020) indicated that most insects have diverse altitudinal regimes and could thrive in regions with varying topography. In Kenya, the coastal, Baringo (in the Rift Valley region), and eastern regions are characterized by low altitude, while Taita hills (in the coastal region), Nandi hills (in the Rift Valley region), Kisii highlands (in the Nyanza region), Mt. Kenya, Mt. Elgon (in the Rift Valley region), and Kakamega (in the western region) are more elevated. Therefore, the impact of varying altitudes on the probability of stingless bee occurrence could not be conclusively deciphered. Nevertheless, such an impact could influence other parameters such as food resources and climate regimes, which could impair or encourage proliferation of the bees. Therefore, topographical variables in seclusion may not be reliable in developing EN models but should be coupled with other features such as vegetation phenology and climate.

Stingless bees' prediction models and validation
After eliminating the least accurate stingless bee prediction models, scores ranged from "fair" to "excellent" based on the modified Swets discriminatory power. It is reported that predictive model precision and accuracy in space depend on careful and accurate selection of predictor features vis-a-vis response variables (Aranda and Lobo, 2011;Naimi and Araújo 2016;Araújo et al. 2019). Also, they are largely dependent on the selection of important non-conflating variables while circumventing redundancy.
Although all predictor models were subjected to same response variables and model conditions, model disparities were observed especially in threshold-dependent and -independent accuracy measurements for MARS and BRT. Models with low prediction accuracy exhibited less accurate ordering of sensitivity and specificity, closer to the one-to-one mid-point interaction line on the ROC models. Individual model replicates used to average model means indicated increased dispersion from both test and training averaged mean curves. In addition, dispersions increased as prediction accuracy reduced, pulling closer to the midpoint, hence lowering the averaged mean curves. Observed model instability indicate decreased model dependability for niche prediction of species, especially where high precision and accuracy are desirable (Farley 2017;Felton et al. 2021). However, individual model replicate dispersion observed in this study did not pass the one-to-one midpoint ordering of specificity and sensitivity. Even though accuracy did not reach the excellent level for at least four of the selected models, prediction skewness was not significant since all models attained acceptable ordering of sensitivity and specificity (Kajita, Aihara, and Kobayashi 2017;Gao and Tian 2021). Notwithstanding, this study suggests that robust and rigorous approaches should be employed to ensure accurate collection and collation of response variables, thorough selection of predictor features, and good choice of artificial and deep learning prediction algorithms for improved stingless bee distribution predictability power.

Stingless bees' ecological niche prediction
Studies have shown that increased variability in both temperature and precipitation decrease the population of beneficial insects such as stingless, honey, and bumble bees (Roulston and Goodell 2011;Torné-Noguera et al. 2014). Indeed, the EN models indicated that there was a general decline of suitability of stingless bee niches across the study area by 63,405.02 km 2 , more than proliferation. Additionally, most regions reduced in stingless bee niche suitability from high and moderate suitability to lower classes such as Kakamega (in the western region), Nandi hills and Bomet (in the Rift Valley region), Mwingi (in the eastern region), Taita hills, and other coastal regions. Besides, these regions were demonstrated to have reduced annual rainfall with increased potential evapotranspiration and temperature seasonality (Platts, Omeny, and Marchant 2015;Fick and Hijmans 2017). The interaction of these climate variables could have shifted the primal climatic conditions for stingless bees in these regions and affected their phenological patterns. However, the stingless bee distribution EN models indicated that there was a slight increase in spatial suitability of stingless bees, by 4,197.53 km 2 , in many regions across the study site. This could be attributed to the stingless bees' polylactic nature, since they forage on a wide range of plant species including weeds, which are predicted to increase with climate change, hence keeping their colonies strong. On the other hand, improved colony strength would enable them to fight off diseases and pests, whose increase has been demonstrated to cause bee population decrease and colony collapse across the world (Williams et al. 2010;Dainat, Vanengelsdorp, and Neumann 2012).
The generalist behavior of stingless bees suggests that over time, they have adapted to shifting vegetation density and diversity (Slaa et al. 2006;Njoya 2010). This adaptation could inform dispersion and distribution of their suitable niches across different agroecological and agroclimatic zones, with subtle differences within given zones. For instance, the coastal region of Kenya displayed varied suitability of stingless bee occurrence, even though temperature and precipitation regimes are almost similar. Also, the vegetation's phenological composition varies from Kwale through Mombasa to Kilifi County in the region. The latter county has more vegetation diversity and density, making it more suitable for stingless bees. On the other hand, regions neighboring Arabuko Sokoke Forest in Kilifi county were predicted to be more suitable with high accuracy. Besides, their colonization rates were demonstrated to be higher than the neighboring Kwale County.
Moreover, inclusion of vegetation phenological information in species diversity models improve their prediction power and sharpens their predictability. Vegetation phenology variables contain information extracted from MODIS EVI data on each of the growing seasons, hence reducing the residual signal noise and data dimensionality in unprocessed EVI time series data (Hinton and Salakhutdinov 2006;Shen et al. 2013). This could have introduced bias into the stingless bee distribution prediction models. Furthermore, vegetation phenology variables reduce overfitting of species distribution prediction models, since bioclimatic variables are coarser at the grain level (1 km spatial resolution) and only offer interpolated climatic information, which depict homogeneity at large spatial extents. On the other hand, vegetation phenology data is finer at grain level, offering higher spatial resolution (250 m), hence have more power to incorporate environmental heterogeneity on the ground (Saatchi et al. 2008). In this regard, remotely sensed vegetation phenology data captures adverse environmental conditions, which limit the spatial distribution of stingless bees within homogenous climatic pockets. Also, remotely sensed vegetation phenology data captures subtle differences in vegetation variability on the ground, which determine availability of pollen and nectar, the primary sources of food and energy for the stingless bees. Moreover, remotely sensed vegetation phenology data depicts heterogenous ground conditions at near real-time basis, capturing short-term changes in small spatial extents, contrary to bioclimatic data that is interpolated over large homogenous regions and over extended temporal periods. In this regard, incorporating remotely sensed vegetation phenology data into the stingless bee spatial distribution models made them more realistic and dependable.
On the other hand, the prediction models revealed that topography had little influence on the distribution of stingless bees (Jackson et al. 2018;Birrell et al. 2020). The predicted maps showed high habitat suitability across altitudinal gradients with no regard of ground elevation. Nonetheless, direct impact of topography on the stingless bees could not be straightforwardly established, and its influence on microclimate and vegetation density and diversity could only be inferred and deduced to have similar effects on stingless bees' proliferation as phenology and climate.

Conclusions
This study developed the first ever predictive EN models for the spatial distribution of stingless bees in Kenya using an ensemble modeling approach of six machine learning EN algorithms. The polylactic nature of stingless bees is particularly important due to nonselective pollination of a wide range of both food and non-food plants, in mixed cropping systems and virgin land. Careful selection and preparation of response and predictor variables is, however, paramount in developing accurate, dependable, and robust predictive models. Relative variable importance, recursive feature elimination (RFE), collinearity determination procedures, and selection of non-conflating features are equally important. Based on TSS accuracy, six EN modeling algorithms proved most accurate in predicting the occurrence and spatial distribution of stingless bees. Specifically, random forest (TSS = 0.91), ranger (TSS = 0.90), generalized additive models (TSS = 0.87), multivariate adaptive regression spline (TSS = 0.80), boosted regression trees (TSS = 0.80), and recursive portioning and regression trees (TSS = 0.79) were the most accurate models that were included in the ensemble for predicting the EN of stingless bees in the study area. Ensembling these six EN algorithms enhanced their strengths, while diminishing their weakness, hence providing a more reliable and accurate stingless bee distribution mapping method. The bioclimatic data, especially mean annual rainfall, provided the most useful information needed for predicting the spatial distribution of stingless bees. However, vegetation phenological variables substantiated most features toward the stingless EN models. Although topography was significant to the models, its contribution could not be directly established, hence only inferred through other features such as vegetation phenology and microclimate. Projection of the EN models into the future (2055) based on hypothesized and modeled climatic pathways by the IPCC indicated that there was more spatial diminishing of stingless bee suitable habitats than proliferation. This was occasioned by high temperature and precipitation seasonality in 2055, which affected their ecological niches the most. In effect, the stingless EN models could be utilized to provide important information to beekeepers and farmers practicing insects pollinated mono and mixed cropping systems for increased productivity. On the other hand, the results for this study could provide insights for conservationists grappling with effects of climate change, increased anthropogenic disturbance, and habitat fragmentation toward natural vegetation stands. port. We would also like to appreciate the European Union for providing resources utilized in the field sampling exercise (Project number; DCI-FOOD-2011/023-520), Bayer AG Crop Sciences (Project number: NC20662450), and NORAD (Project number: RAF-3058 KEN-18/0005). We gratefully acknowledge the financial support for this research by the following organizations and agencies: UK's Foreign, Commonwealth & Development Office (FCDO); the Swedish International Development Cooperation Agency (Sida); the Swiss Agency for Development and Cooperation (SDC); the Federal Democratic Republic of Ethiopia; and the Government of the Republic of Kenya. The views expressed herein do not necessarily reflect the official opinion of the people and agencies that supported the authors.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article and any additional data are available from the corresponding author, upon reasonable request on davidmakori@gmail.com.