Tundra shrub expansion in a warming climate and the influence of data type on models of habitat suitability

ABSTRACT Warming across the low Arctic is increasing tundra vegetation productivity and facilitating the expansion of upright shrubs. We modeled the effects of warming on habitat suitability in green alder, dwarf birch, Labrador tea, bog bilberry, and lingonberry and assessed the influence of data type (true absence or pseudo-absence) on species distribution models (SDMs). We generated SDMs using the two absence data types under current (1970–2000) and future (2061–2080) climate projections. Our results show that warming leads to range expansion of all shrubs, though responses vary in magnitude and extent, with mean increases in suitability ranging from 0.080 (Labrador tea) to 0.369 (lingonberry) with true absences. Differences in driving variables and suitability projections suggest that physiological and ecological variability between species mediate responses to warming. Between data types, we observed inconsistencies in model performance, suitability projections, and variable importance. Bog bilberry and lingonberry exhibited larger differences in suitability (0.201 and 0.288, respectively), whereas alder showed similar responses (difference of 0.01). These results are important to consider when assessing changes in habitat suitability or identifying environmental or climatic determinants of species’ distributions. We suggest further development of open data repositories, facilitating access to true absence data to support conservation and land use planning.


Introduction
Warming air temperatures and altered precipitation at high latitudes are driving shifts in tundra vegetation, increases in the frequency of the disturbance associated with permafrost thaw, and changes in the extent of surface water (Kokelj et al. 2015;Vincent et al. 2015; Intergovernmental Panel on Climate Change 2019; Travers-Smith, . Field-based and remote sensing research shows that increases in the productivity of tundra vegetation have been caused primarily by the proliferation of tundra shrubs (Lantz, Marsh, and Kokelj 2013;Seider et al. 2022). Increased growth and reproduction in response to natural and experimental warming has also been observed in alder and birch (Tape, Sturm, and Racine 2006;Walker et al. 2006;Ropars and Boudreau 2012;Moffat et al. 2016;Bjorkman et al. 2020;Travers-Smith and Lantz 2020;Mekonnen, Riley, Berner et al. 2021), but additional research is required to explore the climate sensitivity of other common tundra shrubs.
Documenting variation in the responses of different shrub species to climate change is important because vegetation structure has a significant impact on carbon cycling, permafrost dynamics (McGuire et al. 2006), and the climate system (Bonan et al. 2003;Port, Brovkin, and Claussen 2012). As such, understanding the influence of shrubs and vegetation change on these processes is important for existing global climate models. Accurate projections of changes in shrub abundance are also important because shrubs are expected to significantly impact wildlife habitat (Joly et al. 2007;Ehlers et al. 2021) and subsistence hunting in northern communities (Wildlife Management Advisory Council, North Slope & Aklavik Hunters and Trappers Committee 2018).
Species distribution modeling is a common technique used to quantify the influence of climate and terrain factors on the ranges of terrestrial vegetation and to assess the impacts of climate warming on their future distributions (Guisan and Zimmermann 2000). Many species distribution models (SDMs) utilize presenceonly data, which are widely available in open-access repositories such as the Global Biodiversity Information System (gbif.org). With the exception of profile techniques (e.g., BIOCLIM, DOMAIN, ecological niche factor analysis), models built using presence-only data rely on randomly selected pseudo-absence (or background) data to represent absence locations. These are used to provide information on the total variability in environmental predictors across a selected area (Phillips et al. 2009) and to represent locations where species are not present for logistic regression or other binary response methods that require absence data (Wisz and Guisan 2009).
In their review of 250 SDM studies, Santini et al. (2021) found over 84 percent of SDMs used pseudoabsences or background data. However, a lack of best practices to determine pseudo-absence locations adds to the confusion and reduced interpretability of SDM studies using this approach (Barbet-Massin et al. 2012;Santini et al. 2021). Previous studies have shown that the use of pseudo-absence data can negatively impact model performance and interpretation through sampling bias and poor data quality and spatial accuracy (Pearce and Boyce 2006). When pseudo-absence locations do not complement spatial coverage and sampling effort of presence observations, model predictions lose the ability to accurately portray distributions across the entire study domain (Phillips et al. 2009). On the contrary, standard performance metrics indicate that many studies using pseudo-absence data have also performed favorably Zhang et al. 2019;Kaky et al. 2020), which suggests the need for further investigation into overall performance of such data.
This study focusses on the Beaufort Delta region in the western Canadian Arctic, the area experiencing the most rapid temperature increases in Canada (Vincent et al. 2015). This warming is increasing the productivity of tundra landscapes (Tape, Sturm, and Racine 2006;Lantz, Marsh, and Kokelj 2013;Fraser et al. 2014;Campbell et al. 2021;Seider et al. 2022) and facilitating widespread shrub proliferation (Lantz, Marsh, and Kokelj 2013;Moffat et al. 2016;Travers-Smith and Lantz 2020). In our analysis, we explore differences in the sensitivity of five common tundra shrub species to climate change by comparing projected changes in habitat suitability with climate warming. Specifically, we developed SDMs for green alder (Alnus viridis), dwarf birch (Betula glandulosa and B. nana), Labrador tea (Ledum decumbens), bog bilberry (Vaccinium uliginosum), and lingonberry (Vaccinium vitis-idaea) and applied a future high-emissions climate scenario projecting habitat suitability to the period between 2061 and 2080. With these models, we also investigate how data type (true absence or pseudo-absence) influences (1) SDM performance, (2) estimates of habitat suitability, and (3) projections of change in habitat suitability for the five shrub species. By exploring the use of different data types, we also seek to understand how differences in the data type may influence the application of SDMs, particularly with regards to regional vegetation distributions in a warming climate. These models were built under the assumption of a stationary relationship with respect to the environment. Our models are not mechanistic and do not account for seed dispersal or establishment, which could be important for potential future distributions of these species. Our investigation into how SDM parameterization potentially affects habitat suitability projections can provide insights on the use of this technique and its application in conservation and land use planning.

Study area
This study focuses on the Beaufort Delta region of northern Yukon and Northwest Territories, covering an area of approximately 161,000 km 2 (Figure 1). The southern portion of this area includes the Yukon and Tuktoyaktuk Coastal Plains and the Bathurst Peninsula ecoregions, which are characterized by rolling hills dominated by shrub and tussock tundra as well as lowlying wetland habitats (Yukon Ecoregions Working Group 2004;Ecosystem Classification Group 2012). The Yukon and Tuktoyaktuk Coastal Plain ecoregions are separated by the Mackenzie Delta ecoregion, which consists of low-lying alluvial terrain where a mosaic of forest, woodland, shrubland, and sedge wetland is strongly influenced by the hydrology of the delta (D. Gill 1972Gill , 1973C. Pearce 1986;Burn and Kokelj 2009). The northern part of the study area includes the Banks Island Coastal Plain and Banks Island Lowland ecoregions, a mid-Arctic landscape characterized by hummocky tills and glaciofluvial plains with some exposed bedrock throughout (Ecosystem Classification Group 2012). Vegetation communities on Banks Island are controlled primarily by soil moisture (Campbell et al. 2021) with well-drained upland terrain occupied by a mix of barrens and dwarf shrub tundra and wetter lowland terrain dominated by more productive sedge tundra (Ecosystem Classification Group 2012). All regions within the study area are underlain by continuous permafrost and exhibit common permafrost features such as polygonal terrain, hummocks, and thaw slumps (Rampton 1982(Rampton , 1988Yukon Ecoregions Working Group 2004;Ecosystem Classification Group 2012). With the exception of the Yukon Coastal Plain and of the northern tip of Tuktoyaktuk Peninsula, the study area was covered by the Laurentide Ice Sheet during the Wisconsinan Glaciation (Jessop 1971;Rampton 1988;Ecosystem Classification Group 2012).

Study design overview
To investigate the effect of climate warming on the distribution of tundra shrubs, we used SDMs to project habitat suitability for five common tundra shrubs (alder, birch, Labrador tea, bog bilberry, and lingonberry) under a high-emissions climate scenario for 2061 to 2080. To assess the influence of data type on ensemble SDM performance and predictions of habitat suitability, we created two models for each species: one using true absence data and one using pseudo-absence data. Because all other model parameters and environmental variables remain constant, we attribute differences in model performance and predictions of suitability to data type.

Species
We selected species for this analysis representing a variety shrub functional types and growth forms. Green alder (Alnus viridis, also known as A. crispa or, more recently, A. alnobetula) is a deciduous tall shrub. This shrub has a broad distribution across the Northern Hemisphere and is known to establish on newly exposed mineral soils after disturbance (Furlow 1979). Green alder is primarily a subarctic species (Furlow 1997), but several recent studies have documented increases in alder stand density and abundance across the low Arctic tundra of Alaska (Tape, Sturm, and Racine 2006), Northwest Territories (Travers-Smith and Lantz 2020), Labrador (Larking et al. 2021), and Siberia (Frost and Epstein 2014). We used a species complex including Betula glandulosa and B. nana to describe dwarf birch, a deciduous dwarf shrub. These dwarf birches are both found on nutrient-poor, well-drained, moist acidic soils across the circumpolar range (De Groot, Thomas, and Wein 1997). They are taxonomically confused, particularly where their ranges overlap and hybridization makes species identification difficult (De Groot, Thomas, and Wein 1997). For this reason, we consider observations of both of these species to represent this deciduous shrub species complex. Marsh Labrador tea (Ledum decumbens, also known as L. palustre or, more recently, Rhododendrom tomentosum) is a low shrub with evergreen leaves. This species is commonly found in mesic dwarf shrub or lichen heaths across a largely circumpolar range (Scoggan 1979). Bog bilberry (Vaccinium uliginosum) is a deciduous shrub commonly found in nutrient-poor, moist-to-wet acidic soils with a circumpolar range (Jacquemart 1996). Finally, lingonberry (V. vitis-idaea) is an evergreen dwarf shrub common across the low Arctic and southern boreal forest on dry to moist soils (Taulavuori, Laine, and Taulavuori 2013). Throughout this article, we refer to each species by its common name.
Plot-level presence/absence data used to parameterize models were obtained from a number of sources. Vegetation data from the Northwest Territories were collected from surveys conducted between 2005 and 2019 (see Lantz et al. 2009;Lantz, Gergel, and Henry 2010;Gill et al. 2014;Steedman 2014;Cameron and Lantz 2016;Chen 2020;Travers-Smith and Lantz 2020;Shipman 2021;Seider et al. 2022). Vegetation cover data from across northern Yukon were obtained, with permission, from the Yukon Biophysical Inventory System (Yukon Territorial Government 2021). These data were collected from field surveys conducted between 2000 and 2015. Data from southern Banks Island were obtained, with permission, from the Canadian Wildlife Service (see Campbell et al. 2021). To use these percentage cover data in our SDMs, we converted them to presence/absence for each species at each site (see Supplementary Figures S1 to S5 for the spatial distributions of presence/absence points for each species). The spatial accuracy of the plot locations for all of these data sources is much greater than the 30-arcsecond resolution of the environmental predictors. To minimize the possibility of pseudoreplication and to ensure that models were not trained using multiple observations within the same cell, we implemented a spatial thinning procedure to ensure that no two observations were closer than 5 km using the "ensemble.spatialThin" function from the BiodiversityR package (v2.12-3; Kindt and Coe 2005). We chose the thinning distance of 5 km as a conservative buffer because the spatial resolution of 30 arcseconds of predictor variables at the northernmost point of our study area is approximately 1 km. The prevalence of each species is listed in Table 1.
To use these data in presence-only models, we converted presence/absence data to presence-only data by removing any observations of true absences from the data. We then used a random pseudo-absence strategy selecting points with a minimum distance of 5 km from presence locations to generate pseudo-absence points using the "BIOMOD_FormatingData" function. This strategy implements a random selection of points from all possible cells of predictor data outside the predetermined buffer (see Fournier et al. 2017;Kaky et al. 2020), providing a sample of predictor variability that can be contrasted to the variability within presence locations (Phillips et al. 2009). The total number of pseudoabsence points in each selection equals the number of observed presence locations.

Climate predictor variables
Historic  climate data (30-arcsecond resolution) used to parameterize our models were obtained from the WorldClim v2.1 data set (Fick and Hijmans 2017). Climate parameters in this data set consist of nineteen ecologically relevant variables derived from average monthly temperature and precipitation values (Fick and Hijmans 2017). We performed a hierarchical cluster analysis following the method used by Fournier et al. (2017) on these data and grouped the nineteen variables using the resulting correlation matrix. Variables were grouped if they had a Pearson's r greater than 0.7. We then selected one variable from each group to best represent a diversity of climate factors relating to temperature and precipitation. Based on this clustering, we chose the following seven variables to represent climate across the study area: annual mean temperature, mean diurnal range, isothermality, temperature seasonality, mean temperature of the coldest quarter, precipitation seasonality, and precipitation of the warmest quarter (Table 2).
Species range projections utilized downscaled global climate model (GCM) data from the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor, Stouffer, and Meehl 2012). To obtain these data, we followed methods described in Lee, Williams, and Pearson (2019) and used a multimodel ensemble of four GCMs (CCSM4 from the National Center for Atmospheric Research, GFDL-CM3 from the Geophysical Fluid Dynamics Laboratory, HadGEM2-ES from the Met Office Hadley Center, and MPI-ESM-LR from the Max Planck Institute for Meteorology). We chose the Representative Concentration Pathway (RCP) 8.5 scenario developed for CMIP5 to base our models on the most severe estimates of future warming. This worst-case scenario presents a future defined by high carbon emissions with radiative forcing of 8.5 W/m 2 by 2100 (Moss et al. 2010). These GCMs are available as downscaled data using WorldClim (v1.4) as the climate baseline and averaged from 2061 to 2080 to 30-arcsecond resolution (Hijmans et al. 2005). To create the ensemble climate projection, we took a simple average of each individual bioclimatic variable from each GCM, as obtained from WorldClim, using the terra package. Maps of each variable under current and future climate conditions are presented in Supplementary Figures S6 and S7.

Environmental predictor variables
We used elevation data from the ArcticDEM available from the Polar Geospatial Center (Porter et al. 2018) to create a 2-m resolution digital elevation model (DEM) across the study area. We aggregated the DEM to 30-m resolution by taking the mean of the subpixels before applying any further transformations to improve data processing efficiency. Cells of missing data in this DEM were filled using the Multi-Error-Removed Improved Terrain DEM (Yamazaki et al. 2017) that we resampled to match the resolution and extent of the ArcticDEM area using the bilinear method in the "resample" function of the terra package. We calculated slope using the "terrain" function from the terra package and the vector ruggedness measure (VRM) using the tool developed by Sappington, Longshore, and Thompson (2007) implemented in ESRI ArcMap (v10.6.1). VRM provides a measure of ruggedness that is independent of slope and is represented as an index value between 0 and 1, where 0 is considered flat and 1 is most rugged (Sappington, Longshore, and Thompson 2007). Throughout this article, we refer to VRM as "ruggedness." We also calculated aspect and the topographic wetness index for this analysis; however, in early model iterations these variables were among the least important and were dropped from subsequent models. We resampled all environmental data from 30 meters to match the 30-arcsecond resolution used in this analysis using the bilinear method of the "resample" function in the terra package. Maps of elevation, slope, and ruggedness are presented in Supplementary Figure S8. We also used the National Aeronautic and Space Administration's Arctic Boreal and Vulnerability Experiment annual land cover classification (Wang et al. 2019) to remove any cells from the analysis that were classified as "water" in 2014 (the last available year of data).

Species distribution modeling
We constructed SDMs for five common tundra shrub species to investigate the relative response of each species to climate change under projected climate warming scenarios. We also evaluated the influence of data type (presence/absence data or presence/pseudo-absence data) on model performance and suitability predictions.
Ensemble SDMs were generated using the biomod2 package (v4.0; Thuiller et al. 2022) in the R statistical software (R Core Team 2019). Ensemble models included generalized linear models, generalized boosted models (GBM), multiple adaptive regression splines, artificial neural networks, and random forest (RF) algorithms, which were set to use default modeling options in the biomod2 package. These models represent a mix of traditional linear techniques (generalized linear models and multiple adaptive regression splines), decision trees (GBM and RF), and machine learning (artificial neural networks, GBM, and RF) algorithms (see Table 3). Ensemble SDM analyses commonly use a wide variety of such algorithms with successful results (Fournier et al. 2017;Lee, Williams, and Pearson 2019;Kaky et al. 2020). We chose to use an ensemble method because recent work has shown that no single SDM algorithm has superior prediction power (Segurado and Araújo 2004;Hao et al. 2020). The ensemble method has also been shown to outperform individual models in SDM studies (Marmion et al. 2009). We generated ensemble models to create projections of species habitat suitability by taking the mean of individual models. We also used ensemble models to project future habitat suitability using CMIP5 RCP8.5 climate data.
To independently evaluate the performance of ensemble SDMs, we used independent validation data to calculate a series of performance metrics listed in Table 4. For each ensemble model, we set aside a random subset of 10 percent of the data for this purpose. Because individual metrics provide different information on model performance and there can be disagreement with the influence of data on performance metrics (such as prevalence on area under the receiver operating characteristic curve [AUC]; see Allouche, Tsoar, and Kadmon 2006;Leroy et al. 2018), we use multiple metrics to test model performance (Table 4). Because all metrics seek to highlight different aspects of  Average difference between actual observation and predicted probability of observed presence in independent data -the performance and accuracy of modeled results, considering many such metrics allowed us to better understand the overall performance for comparative purposes.
We also created a null model with 100 randomly allocated "presence" locations and 100 randomly allocated "absence" locations (similar to sample size of the three shrubs) to determine the relative performance of each SDM model against random predictions. We used the variable importance function in bio-mod2 to understand the potential drivers of shrub species' distributions in the Beaufort Delta region and to assess the influence of environmental predictors on habitat suitability under projected climate warming. This function calculates variable importance by randomizing a single variable in each of five randomized permutations to calculate the correlation between the predictions of the complete and randomized variable ensemble models (Thuiller et al. 2022). Variables that have a lower correlation value when removed from the model are assumed have greater influence on model predictions.

Results
Our SDMs projected that climate warming will enhance habitat suitability of all shrubs considered in this study beyond the current range limits of all species (Figures 2-6; Table 5). However, contrary to our expectations, climate warming also reduced habitat suitability for species in the core regions (area of highest suitability) of their current habitat suitability projections (Figures 2-6). This was particularly in the pseudoabsence models. These general trends of change are observable with both data types, although true absence data tend to result in greater increases of habitat suitability (Table 5). Projected shifts in habitat suitability in response to climate warming showed similar patterns among true absence models, but the magnitude of change differed considerably among species. The greatest increase in suitability across the entire study area was lingonberry, with an increase in suitability of 0.369. The species with the lowest observed change in habitat suitability was Labrador tea, increasing by 0.080 (Table 5). On Banks Island, habitat suitability showed large increases in response to climate for all shrub species modeled (Figures 2-6). The SDMs constructed in this analysis ran with moderate or reasonable model performance but exhibited considerable variation among performance metrics (Table 6). With respect to the two most commonly used model performance metrics, AUC and true skill statistic (TSS), on average, AUC noted marginally better performance on pseudo-absence data and TSS showed better performance with true absence data (Table 6). On average, percentage correctly classified, specificity, TSS, and overprediction rate (OPR) indicate better performance for SDMs with the implementation of true absence data (Table 6).
These SDMs also showed large differences in habitat suitability projections for current climate conditions between true absence and pseudo-absence models (Figures 2-6). For alder, the pseudo-absence model projected greater suitability along the coastal margin of the Yukon North Slope and near the communities of Inuvik and Tuktoyaktuk compared to the true absence model, which had the highest habitat suitability along the southern part of the study area in the Mackenzie Delta (Figure 2). Birch and lingonberry had similar habitat suitability, but the spatial pattern varied between models using true absence and pseudo-absence data (Figures 3  and 6). The true absence models for birch and lingonberry showed high suitability over a large area including the Tuktoyaktuk Coastlands. In contrast, the pseudoabsence models for these species had the highest suitability across the Yukon North Slope and a more restricted portion of the Tuktoyaktuk Coastlands. Labrador tea exhibited similar suitability between data types (Figure 4). The projected suitability of bog bilberry also showed differences between data types, highlighting higher suitability in the Richardson Mountains using true absence data, but with pseudo-absence data the modeled area of suitability also included the more coastal areas of the Yukon North Slope and parts of the Tuktoyaktuk Coastlands ( Figure 5). Lastly, in the cases of bog bilberry and lingonberry, there were notable differences between true absence and pseudo-absence models on Banks Island, with suitability exceeding 0.5 in the true absence models for these species (Figures 5  and 6). When considered across both data types, the three most commonly important variables across all species are the annual mean temperature, precipitation of the warmest quarter, and elevation (Table 7). Differences in variable importance between data types were otherwise species dependent. True absence and pseudo-absence models also showed variation in the most important variables driving model projections (Table 7). This is particularly notable for bog bilberry in which the most important variable for true absence is mean diurnal range but ranked least important in the pseudoabsence model (Figure 7).
The rank and magnitude of important variables also differed substantially between species in true absence models (Figure 7). For example, annual mean temperature was key in predicting the distributions of birch, Labrador tea, and lingonberry, whereas alder and bog bilberry showed a greater reliance on variables derived from elevation or precipitation data (Figure 7). Of all variables in true absence models, annual mean temperature, precipitation seasonality, and precipitation of the warmest quarter were most often of high importance across all modeled species (Table 7).

Tundra shrub response dynamics
Our results support existing literature showing that climate warming will drive range expansion in tundra shrubs (Epstein et al. 2004;Myers-Smith and Hik 2018;Mekonnen, Riley, Berner et al. 2021), but suggests that the magnitude of change will differ considerably among species. Our models also highlight the potential for high Arctic landscapes not currently dominated by shrubs to become climatically suitable in the near future. This finding is consistent with recent remote sensing and field-based studies that have documented the proliferation of shrubs in tundra landscapes across the circumpolar region (Tape, Sturm, and Racine 2006;Myers-Smith et al. 2011;Jørgensen, Meilby, and Kollmann 2013;Lantz, Marsh, and Kokelj 2013;Fraser et al. 2014; Our finding that future habitat suitability differed among species suggests that shrubs will respond . Ensemble habitat suitability maps (gray box) for marsh Labrador tea (Ledum decumbens) projected under current and future climate conditions using true absence and pseudo-absence models. Banks Island is inset over the mainland portion of the study area for enhanced visualization. Plots (A)-(D) correspond to differences between climate projections and data types along the columns and rows.
The top three variables in each model are shown in bold red and the five least important variables are noted with a dash. TA = true absence; PA = pseudoabsence.
individualistically to climate warming based on resource requirements and physiological adaptations (Chapin et al. 1996). This is also evidenced by our observations of differences in variable importance among the species we modeled using true absence data. The importance of precipitation seasonality, particularly for alder and Labrador tea, shows that climate-driven expansion in these species will likely be mediated by soil moisture. Precipitation seasonality is linked to the temporal availability of water across the landscape, which, in conjunction with the physical properties of soil, influences a plant's ability to access and retain moisture (O'Donnell and Ignizio 2012; Renne et al. 2019). This explanation is consistent with recent findings showing that growth and productivity of alder in upslope areas is moisture limited (Black, Wallace, and Baltzer 2021).
Higher soil moisture has also been associated with proliferation of tundra shrubs in general (Tape, Sturm, and Racine 2006;Frost and Epstein 2014;Myers-Smith et al. 2015;Cameron and Lantz 2016;Ackerman et al. 2017) and increased vegetation growth (Elmendorf et al. 2012;Ackerman et al. 2017;Bjorkman et al. 2018) on these dynamic landscapes.
Our analysis also suggests that physiological tolerances in birch and lingonberry will mediate how their ranges will respond to ongoing warming. The relatively high importance of annual mean temperature in these models compared to variability in precipitation variables for these species suggests that birch and lingonberry will be less moisture limited under a warmer climate (Figures 3 and 6). This finding is consistent with previous research showing the greater tolerance of lingonberry to a range of environmental conditions (Taulavuori, Laine, and Taulavuori 2013). Several studies also show that these species can respond rapidly to increased temperature. Lingonberry exhibits increased shoot growth in response to rising temperature (Shevtsova, Haukioja, and Ojala 1997) and dwarf birch responds to experimental warming with earlier germination and higher recruitment (Milbau et al. 2009). Dwarf birch is also capable of rapid secondary growth and reproduction under warming-induced increases in Figure 5. Ensemble habitat suitability maps (gray box) for bog bilberry (Vaccinium uliginosum) projected under current and future climate conditions using true absence and pseudo-absence models. Banks Island is inset over the mainland portion of the study area for enhanced visualization. Plots (A)-(D) correspond to differences between climate projections and data types along the columns and rows.
It is important to note that the observed shifts in habitat suitability predicted by our SDMs do not consider all ecological factors that can limit dispersal and recruitment such as biotic (i.e., species interactions) or abiotic (i.e., seedbed conditions) vectors (Angers-Blondin, Myers-Smith, and Boudreau 2018) or unreasonably large distances (i.e., across large water bodies). For example, small populations of birch, Labrador tea, bog bilberry, and lingonberry on Banks Island (Aiken et al. 2007) provide seed sources that could facilitate range expansion consistent with our SDMs. Yet, the absence of alder on Banks Island suggests that seed limitation will cause range expansion in this species to lag behind the presence of a suitable climate. Increased density of birch (Ropars and Boudreau 2012) and alder (Travers-Smith and Lantz 2020) across the low Arctic and subarctic without significant range expansion also indicates that dispersal limitation can cause temporal lags between warming and range expansion (Svenning and Sandel 2013).
In addition to dispersal limitations, tundra vegetation dynamics are heavily influenced by disturbance (Lantz et al. 2009;Lantz, Gergel, and Henry 2010;Wang et al. 2020;Chen, Hu, and Lara 2021;Lantz, Zhang, and Kokelj 2022). Wildfire, in particular, can cause significant responses in shrub growth, which can be either positive or negative depending on fire size, frequency, severity, and substrate (Lantz, Gergel, and Henry 2010;Chen 2020;Travers-Smith and Lantz 2020;Chen, Hu, and Lara 2021). Thermokarst processes have also been associated with increased shrub abundance (Lantz et al. 2009;Frost et al. 2013;Wolter et al. 2016;Huebner, Buchwal, and Bret-Harte 2022;Lantz, Zhang, and Kokelj 2022) and are likely to play a role in changing shrub distributions. Advances in landscape and global modeling efforts and our understanding of vegetation ecology and responses to disturbance are crucial for understanding how landscapes are changing and how they will influence larger systems.
Differences in suitability among models parameterized with true absence and pseudo-absence data highlight the sensitivity of SDMs and suggest that model outputs should be compared to expert knowledge and ecological information of processes known to drive dispersal and recruitment. Additionally, combined modeling approaches including mechanistic models, longitudinal studies, and other experimental data sources could improve model design and interpretation. The statistical outputs of SDMs based on observations of presences and absences also do not account for factors including microsite availability and predation that frequently limit recruitment (Soberón and Nakamura 2009). Differences in the projected response of alder, birch, Labrador tea, bog bilberry, and lingonberry also highlight the importance of using species-based assessments of change to parameterize larger scale dynamic vegetation models. Dynamic vegetation models are commonplace in coupled Earth system models and GCMs, and advances in the implementation and accuracy of species-based modeling will benefit climate projections (Quillet, Peng, and Garneau 2010). This is clearly demonstrated by Druel et al. (2019), whose implementation of a dynamic global vegetation model (ORCHIDEE Land Surface Model) did not reproduce significant shrub expansion that has been observed across the circumpolar north (Tape, Sturm, and Racine 2006;Lantz, Marsh, and Kokelj 2013;Fraser et al. 2014;Moffat et al. 2016;Myers-Smith and Hik 2018). This is likely the product of local heterogeneity in microclimate or microtopographic conditions (see Ropars and Boudreau 2012;Gamon et al. 2013;Bjorkman et al. 2018;Mekonnen, Riley, Grant et al. 2021;Seider et al. 2022), an idea supported by Druel et al. (2019) and our own results highlighting the importance of species-specific response data. Parametrizing Earth system models appropriately is important because terrestrial vegetation impacts the climate system by influencing energy fluxes (i.e., surface reflectance, carbon exchange) and surface conditions (i.e., moisture, nutrients, temperature; Mekonnen, Riley, Grant et al. 2021). Coupled Earth system models typically ignore dynamic trait-based vegetation response to climate in favor of static functional tolerances (Van Bodegom et al. 2012;Wullschleger et al. 2014). Though simplification is necessary in global models, the use of broad vegetation functional types (grass, tree, cropland, etc.) may not accurately characterize vegetation change at high latitudes. Individual-based models (see Kruse et al. 2016) can be used to incorporate ecophysiological responses and species traits to determine responses to climate change. There are also promising advances in joint species distribution modeling, using a combination of traditional single-species SDM methods and ordination techniques to understand the interactions of multiple species from a community ecology perspective (Ovaskainen and Abrego 2020). These models can more directly account for interactions among species and can handle rare species better than traditional SDMs (Ovaskainen and Abrego 2020). With these advances in mind, it is still important to consider the data type used in modeling efforts. Our results show that data type can have a strong influence on models and, as such, decisions regarding which parameterization data to use must be made carefully.

Influence of data type
Our analysis shows that the data type used to parameterize SDMs impacts model performance and predictions and is a critical consideration in model interpretation. Projected species distributions, model performance, and ranked variable importance from models built using pseudo-absence data deviated considerably from models that used true absence data. Differences between true absence and pseudo-absence models suggest that true absence data are needed to reliably define low suitability habitats that are not adequately sampled through the pseudo-absence selection process (Brotons et al. 2004;Soberón and Nakamura 2009).
Differences in projected habitat suitability in our pseudo-absence models were likely caused by the random allocation of pseudo-absence locations across a wider range of climate and terrain conditions compared to true absences. Because pseudo-absences are not definitive locations of absence, they do not accurately represent limiting environmental conditions but provide a random sample of representative background data (Soberón and Nakamura 2009). Further, the possible allocation of pseudo-absence locations within the actual distribution of a species results in lower suitability estimates than true absence models that include nonambiguous absence data. Another common SDM algorithm that uses pseudo-absence data (referred to as background data) called MAXENT (Phillips, Anderson, and Schapire 2006) has also been shown to be highly sensitive to sampling bias caused in both presence and background data (Elith et al. 2006(Elith et al. , 2011. Although true absence models are generally preferred because they provide information on low habitat suitability (Brotons et al. 2004), the accessibility of presence-only data makes the use of pseudo-absences in SDMs very popular (Santini et al. 2021).
The widespread use of pseudo-absence models (Santini et al. 2021) is concerning because our results indicate that models built using pseudo-absence data yield different results than those built using true absence data. There is also no clear evidence to support best practices for pseudo-absence selection (Santini et al. 2021), and the optimal number of pseudo-absence points has been found to vary considerably between modeling algorithms (Barbet-Massin et al. 2012). The spatial extent from which pseudo-absences are selected also poses potential problems, because too large or small an area can create models that are not biologically relevant (Vanderwal et al. 2009). To facilitate the development of accurate, reliable, and better performing models, we encourage the use of open data repositories to make true absence data more widely accessible. In addition, we suggest the use of comprehensive, systematic presence/absence data collection as part of botanical inventories (Saarela, Sokoloff, and Bull 2017;Saarela et al. 2020).
Our results have important implications for predicting vegetation changes across Arctic and Subarctic ecosystems in support of conservation and land management decision making (Guisan et al. 2013). Recent work has stressed the importance of including SDM research in International Union for Conservation of Nature Red List developments (Breiner et al. 2017), conservation of endemic and rare species (Marcer et al. 2013;Wang et al. 2015), and monitoring species invasion in protected areas (Pěknicová and Berchová-Bímová 2016;Barbet-Massin et al. 2018). In all of these cases, models built with pseudo-absence data that may not adequately describe species' distributions and responses to change may contribute to ineffective or potentially harmful decisions (Vanderwal et al. 2009).

Acknowledgments
Thank you to all field staff and students from the Arctic Landscape Ecology Lab who, over the last decade, helped collect much of the data used in this analysis. We also acknowledge Nadele Flynn (Yukon Government) and Danica Hogan (Canadian Wildlife Service) for facilitating access to additional data from northern Yukon and Banks Island, NWT.
The authors also thank the two anonymous reviewers for their insightful comments and suggestions.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
Funding for this research was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 06210-2018 to TCL and a Canada Graduate Scholarship Award to JHS), the Northern Scientific Training Program, the Polar Continental Shelf Program, the Aurora Research Institute, and the University of Victoria.

Data availability
The metadata information for this analysis has been uploaded to the Polar Data Catalogue (www.polardata.ca/pdcsearch/ PDCSearchDOI.jsp?doi_id=13274). Most of the data used are available by request from the authors, but data obtained from the Yukon Government and Canadian Wildlife Service are subject to sharing restrictions.