Upper Palaeolithic site probability in Lower Austria – a geoarchaeological multi-factor approach

ABSTRACT In archaeology, predictive models play a key role in understanding the interactions between humans and the palaeo-environment. They are also of great value for cultural heritage management and planning purposes. This is particularly true for Palaeolithic sites in the east Austrian loess landscape, which are often deeply embedded in sediment sequences. In this study, we analyse the geospatial behaviour of 23 Upper Palaeolithic sites in Lower Austria. Hereby, we apply a new approach, which combines the advantages of a classical deductive method with the capabilities of machine learning, implemented via the MaxEnt software. The result is a predictive model for an area of 7850 km², exploring the potential for the presence of Upper Palaeolithic sites. The model highlights several spatial dynamics of site probability in the study area. Possible sources of inaccuracies within the source data and the methodology are critically discussed.


Introduction
Archaeological predictive modelling (APM) tries to predict 'the location of archaeological sites or materials in a region, based either on a sample of that region or on fundamental notions concerning human behaviour' (Kohler & Parker, 1986). It is based on the assumption that human spatial behaviour is predictable and can be extrapolated from samples to larger areas . As 90-99% of all archaeological remains are presumably yet undiscovered, APM is based on the 1-10% explored and documented archaeological sites (Verhagen et al., 2010). As such, an APM offers the chance of preserving the 90-99% undiscovered sites from destruction by e.g. construction projects while being based on only 1-10% discovered archaeological sites, raising the question of representability. In addition to cultural heritage management implementations, APM can help us in understanding the interaction between our ancestors and the palaeo-environment (Kamermans & Niccolucci, 2010). For a more in-depth introduction to APM as well as a thorough discussion of its strengths and weaknesses, see Van Leusen et al. (2005) and references therein.
This study is based on 23 Upper Palaeolithic sites in Lower Austria and presents a first attempt at creating an APM for a Late Pleistocene timeframe in this area. Despite the limited number of sites, the model, in its current state, can already be used as a regional geospatial tool for archaeological research. We recommend implementation into cultural heritage management only after additional empirical data is added and the model is thoroughly validated. To create the model, we use a novel approach, combining the explanatory power of a deductive method with the statistical reliability of an inductive approach. The result can be seen as part of the 'Middle Range Theory', as it breaks up the boundaries between deductive and inductive methodologies (Verhagen & Whitley, 2012).

Study area
The study area covers more than 7800 km² and stretches from the Bohemian Massif to the Eastern Alpine forelands. Geologically, it marks the transition between massif metamorphic rocks to molasses, covered by Pleistocene sediments. Hydrologically, the Danube leaves the narrow valleys of the Bohemian Massif and enters the hilly landscape of the Alpine foreland. For the Late Pleistocene palaeoenvironment, this translates to the transition from a turbulent, impassable river into a traversable braided river system, documented by the Late Pleistocene fluvial deposits of the lower terrace (Geologische Bundesanstalt, 2013). This part of Lower Austria is widely considered a loess landscape (Lehmkuhl et al., 2021). Regarding the disputed definition of loess, mentions in this study include windblown aeolian dust as well as results of post-formational activities as proposed by Smalley et al. (2011). Up to 40 m thick loess-palaeosol sequences (LPS) can be found in the study area, several of which enabled in-depth palaeoenvironment reconstructions (e.g. Haesaerts et al., 1996;Sprafke et al., 2014;Sprafke et al., 2020;Terhorst et al., 2011). At the same time, numerous LPS preserved evidence for Palaeolithic occupation. Today's topography, however, is considerably impacted by historic and recent anthropogenic activity, such as largescale terracing for viticulture, loam extraction and constructions for transportation. Many archaeological sites in the study area were discovered during such activities, potentially causing a considerable sampling bias.

Upper Palaeolithic sites
Several dozen Palaeolithic sites have been discovered in the study area in course of ∼150 years of research. A number of these such as Willendorf, Krems-Hundssteig, Krems-Wachtberg, Kammern-Grubgraben and Langmannersdorf are internationally well-known as they provide significant contributions to our understanding of early modern human behaviour and substantial evidence for human-environment interaction (e.g. Einwögerer et al., 2006;Händel et al., 2020;Neugebauer-Maresch, 2008;Nigst et al., 2014;Teschler-Nicola et al., 2020). Not all sites are suitable for consideration in an APM. In some cases, the material evidence allows neither for an unambiguous technocultural placement nor for the production of numerical ages (i.e. lack of organic remains for radiocarbon dating and heat-influenced lithic objects for luminescence measurements). For other sites, the exact geographic position remains unknown, because the archaeological material was either collected from the surface (i.e. devoid of sedimentary context), or the precise spatial context was not documented. Cave sites were also not considered, as they underlie entirely different formation processes. Hence, 23 Upper Palaeolithic open-air sites remained to be included in this study (Table 1). Techno-culturally, these sites range from the Aurignacian to the Magdalenian. Nineteen sites provided numerical ages in the range between 43 and 17 ka cal BP; the other four allow for unambiguous techno-cultural placement.

Input predictors
In APM, environmental variables with implications for site probability are called predictors, as they are used to predict the dependent variables (Wheatley, 1996). When assessing the potential for archaeological sites, not only factors influencing the settlement choice have to be considered. Natural factors influencing the preservation of sites such as sedimentation processes are at least equally important (Schiffer, 1983). In particular, for the often sparse remains of hunter-gatherer societies, a separation of anthropogenic and natural factors is crucial not only for understanding the formation of a single site (Schiffer, 1983) but also for assessing settlement and/or occupational patterns (Binford, 1980). At the same time, the archaeological record can be indicative for natural processes, i.e. palaeo-environmental conditions, leading to the site's preservation (e.g. Händel et al., 2021). For our study we identified 10 geospatial predictors. From a thematic perspective, these can be divided into three fields: , and . Table 2 lists all predictors together with implications for settlement choice and preservation of records.
All predictors were parameterized using ArcGIS 10.7.1. The main source for most terrain-and water predictors is a digital elevation model (DEM) with a spatial resolution of 10 m (available at www.data.gv. at). As anthropogenic landscape modifications (2.) imply necessity for terrain corrections, this dataset was smoothed prior to additional processing. The goal was to create a DEM which represents a natural landscape as closely as possible. The resulting DEM still differs from a Late Pleistocene palaeo-surface, as climatic, geomorphological and hydrological processes have since fundamentally changed; Holocene sediments covered past stream channels, steep hills wore down, rivers migrated, etc. Late Pleistocene landscape elements cannot be easily recreated and the smoothed DEM can therefore only be considered a best guess. For the smoothing algorithm, a focal mean within a 5-cell radius circle kernel was calculated using the 'Focal Statistics' tool.
The first predictor, elevation, is represented by the smoothed DEM. Slope and aspect were determined (tools 'Slope' and 'Aspect'). For the topographic position, we calculated the topographic position index (TPI) as defined by Gallant and Wilson (2000). Within ArcGIS, this was done by calculating the average elevation around each pixel (tool 'Focal Statistics') and subtracting the resulting raster from the original DEM (tool 'Raster Calculator'). As large radii mainly reveal major landscape units (Reu et al., 2013), a focal mean within a 10-cell radius kernel was used for this calculation. The last terrain predictor, annual sunshine duration, was calculated from the DEM (tool 'Area Solar Radiation').
For all predictors connected to water, we used two datasets, representing different scales. Streamlines from the CCM21 European river dataset represent larger rivers (version 2.1; De Jager & Vogt, 2007). As smaller streams also might have been vital for freshwater supply in the Upper Palaeolithic, an additional drainage was derived from the smoothed DEM. This was done by filling the DEM (tool 'Fill') and calculating flow direction (tool 'Flow Direction') and flow accumulation (tool 'Flow Accumulation'). To include smaller streams, we used a low threshold of 5000 cells within the flow accumulation when extracting streamlines. To parameterize the distance to these river datasets, Euclidean distance was calculated (tool 'Euclidean Distance'). Cost distance was not chosen for several reasons. Firstly, cost distance is not associated with a unit, which is disadvantageous for comprehension. Secondly, due to short distances to potential freshwater sources, a cost raster only has minor impact, especially concerning individual mobility on foot. Parameterization of the predictor 'height above water level' was based on points extracted along streamlines in an interval of 50 m (tool 'Generate Points Along Lines'). In a second step, the elevation value was added to streamline points, representing the water level at each respective point (tool 'Extract Values to Points'). For extrapolation of these point values, we used the inverse distance weighting interpolation algorithm with a search radius of 100 points (tool 'IDW'). In a final step, the resulting raster was subtracted from the DEM to gain the effective height above water level in metres (tool 'Raster Calculator').
For the distribution of loess and Late Pleistocene alluvial plains, the open source dataset of Lehmkuhl et al. (2021) was used. To ensure that it meets the required accuracy for this small-scale application, the dataset was compared to a 1:50,000 geological map (Geologische Bundesanstalt, 2013). Minor corrections were made by adjusting the area covered by Late Pleistocene alluvial plains to fit the respective terrace identified in the DEM.

Implementation of MaxEnt
MaxEnt was developed as a software for species distribution and environmental niche modelling (Phillips et al., 2017;Phillips & Dudík, 2008). The main output of the MaxEnt tool is a raster containing the rate of occurrence (ROR), which, in this case, translates to the predictive probability for Upper Palaeolithic sites. This probability is calculated individually for each cell from the RORs assigned to the underlying predictor-values. For an in-depth explanation of the statistics behind the MaxEnt software, see Elith et al. (2011) and Merow et al. (2013). Due to its high predictive accuracy and easy-to-use application, it has been used in many fields, including archaeology. One of the main advantages of MaxEnt over the predominant technique, the logistic regression, is the fact that Max-Ent utilized presence-only data (Wachtel et al., 2018). Within logistical regression approaches, this can lead to difficulties in estimating an adequate sample size and site density (Verhagen, 2008). By now, many APM case studies have successfully implemented the MaxEnt software (Alwi Muttaqin et al., 2019;Galletti et al., 2013;Gillespie et al., 2016;Jones et al., 2019) and some have made direct comparisons, showing that MaxEnt outperforms logistic regression (Wachtel et al., 2018;Yaworsky et al., 2020). Despite all Table 2. Chosen predictors with implications for settlement choice and preservation as well as implementation method.
advantages of MaxEnt, it still is an inductive model at its core. Therefore, an indiscriminate implementation of the model leads to a loss of the explanatory value the same predictors would provide in a deductive model (Ebert, 2004). To address this issue, MaxEnt includes response curves in the output, which show the predictive probability that is associated with the values of each predictor. These can be used to evaluate the thematic plausibility and thereby assess the causality within the model (Merow et al., 2013). As there is no way to alter the workflow of the model based on the plausibility of the response curves, these were the only outputs extracted for further use as a statistical substructure of a deductive approach. This is the only way to preserve causality between input predictors and the dependent variables and thereby uphold the explanatory value of the model. To fittingly integrate loess and Late Pleistocene alluvial plains into MaxEnt, these predictors were transformed into binary format (1 = present/0 = not present). As such, these predictors were integrated as categorical predictors while all other predictors are continuous. To determine fitting input parameters, the practical guide to MaxEnt by Merow et al. (2013) was used. As the goal of this approach was to identify plausible optimal value ranges within the response curve of each predictor separately, cumulative output was selected. This output rescales the ROR from lowest to highest value on a scale between 0 and 100, which allows for unambiguous interpretation and easy comparison (Merow et al., 2013). The loss of raw probability values is acceptable as these are of negligible importance in a deductive approach.

Deductive method
Subsequently, we evaluated the plausibility of the statistical connection between each predictor and the site probability, represented by the response curves. In cases where the response curves fit the deductive expectation, no modification was made before reclassifying the value of the predictor to a predefined score value. In cases where parts of the response curve were deemed implausible, modifications were made accordingly. All response curves and modifications are presented in Section 4. After obtaining plausible response curves for each predictor, these were reclassified into score values to simplify the model and thereby enhance its explanatory value. To achieve this, the values of each predictor were reclassified into three classes according to the cumulative value within the response curve. The optimal value range for each predictor is defined by a cumulative value equal to or larger than 80 (score value = 3). This corresponds to the top 20% of relative probability within the predictor. Cumulative values equal to or larger than 50 define the viable value range of each predictor (score value = 1). Values below this relative probability threshold are classified as unviable (score value = 0). The large difference between the score value of the optimal and viable class (3 vs. 1) was chosen to highlight the large difference in cumulative relative probability (top 20% vs. top 50%). For the binary categorical predictors loess and Late Pleistocene, a different approach was used. For loess, presence was defined as optimal (score value = 3) and absence as unviable (score value = 0). As Late Pleistocene alluvial plains were defined as exclusion criterion, presence was defined as unviable (score value = 0) and absence as viable (score value = 1). Through this simplifying score value method, the value range was reduced from 0-900 to 0-27, reducing the complexity immensely.

Results
Here, we present the results of the MaxEnt implementation and how the response curves were evaluated and integrated into the deductive approach. As the TPI yielded very low model importance values (permutation importance = 0.0098, Jackknife standalone training gain = 0.0235), implying the absence of a statistical connection to the probability of sites, this predictor was excluded. All other predictors were kept as they showed satisfactory importance values (Figure 1).
The response curve of the elevation predictor (a) was evaluated as being partly implausible. The sharp decline in relative probability below the elevation of 220 m can be explained geomorphologically, as this value corresponds to the erosional base level. The steep decline of probability above 220 m, however, was deemed implausible, as no implications for settlement choice or preservation support this. The generally narrow optimal and viable value ranges resulting from the unmodified response curve can be attributed to the small sample size of only 23 sites, 80% of which are found on an elevation between 204 and 277 m. Factors influencing the upper boundary of viability for settlement are temperature from the settlement choice perspective and the upper boundary of loess deposition from a preservation perspective (see Table 2). As such, the viable value range was set to ≤400 m, which represents the upper boundary for loess accumulation according to Lehmkuhl et al. (2021) and the optimal range to ≤250 m based on empirical data. The statistical connection between relative probability and slope, represented by response curve (b) was deemed plausible. The optimal and viable slope values represent hillsides where geomorphological processes favour the preservation of sites.
It is important to note, that the implications for preservation overshadow the implications for settlement choice. The response curve of the aspect predictor (c) was deemed plausible, as the highest relative probabilities are found in northeast/east aspects, i.e. in the lee of the prevailing westerly winds (Sebe et al., 2015). As such, it fulfils all implications for settlement choice and preservation, listed in Table 2. The exponential incline of relative probability with increasing annual sunshine hours in response curve (d) was evaluated as plausible. Only the rapid decline above 4150 annual sunshine hours was deemed implausible, as no implications for settlement choice or preservation support this. This implausibility in the response curve might again be based on the small sampling size, including no sites with more than 4250 yearly sun hours, although close to 4400 h are theoretically possible in these latitudes. As such, the optimal value range was extended upwards, ignoring this possibly biased decline in relative probability. Response curve (e), representing distances to small rivers was evaluated as partly implausible, as a decline of relative probability below 200 m distance is not expected. A possible explanation for this inconsistency might be the aforementioned sampling bias (2.), as areas in direct vicinity of rivers are less likely utilized for viticulture, loam extraction or construction. Therefore, optimal conditions were assumed for distances below 200 m. However, depending on catchment size, large variations of discharge, justifying this lower threshold, cannot be completely ruled out. Although it bears a great resemblance to the partly implausible response curve of small rivers, response curve (f), representing distance to large rivers, was deemed plausible. The decline in probability below 200 m distance can be explained by the large variations of discharge of these rivers, posing a threat of flooding to camp sites and even more importantly providing unfavourable conditions for site preservation. The response curve for the distance to large river junctions (g) was deemed plausible, as it shows the expected steady decline of relative probability with increasing distance. Response curve (h) was also evaluated as plausible, as both incline and decline of the curve can be explained in the context of the palaeo-environment. The incline with increasing distance expresses the safety from flooding on the one hand while the decline at distance values above 45 m shows decreasing accessibility on the other hand. On the basis of this evaluation, all predictors were reclassified into optimal, viable and unviable value ranges.
An additive approach was chosen to combine all reclassified predictors into a predictive model. To acknowledge the absolute exclusive criterion of Late Pleistocene alluvial plains, this predictor was included multiplicatively at the end of the equation. The reason that this geological predictor is treated as an absolute exclusive criterion is empirical evidence that no sites are documented. As nine reclassified predictors with score values between 0 and 3 are included additively, the resulting additive score value can range from 0 to 27. Within this score range, very high probability for a presence of Upper Palaeolithic sites is assumed when more than half of the predictors lay within their optimal value range (additive score value ≥15); high probability when all predictors are at least within their viable value range (≥9); medium probability when more than half of the predictors lay within their viable value range (≥5); and low probability beneath this threshold (<5).
The map itself shows multiple trends in probability. On a larger scale, a difference between the Bohemian Massif in the west and the Eastern Alpine forelands in the east is apparent, showing generally higher probabilities in the east. On a smaller scale, medium steep slopes near rivers and river junctions show highest probabilities. Within this trend, probabilities on eastern and north-eastern aspects are also elevated (Figure 2).
The resulting model was validated with the input Upper Palaeolithic sites, showing that more than 80% of the sites fall within very high and high probabilities (48% very high, 34% high, 18% medium) and are thus represented by the predictive model. A cross-validation with test sites could not be conducted as the sample size is too small to be split into training and validation datasets.

Discussion
A predictive model can only be as accurate and representative as the data it is based on. Therefore, possible sources for inaccuracies and errors require careful consideration and critical discussion. All predictors trying to depict the Late Pleistocene environment can only be an estimate, as geomorphological and especially hydrological processes have changed fundamentally. Smoothing the DEM helps removing small scale anthropogenic features, while some Late Pleistocene landforms cannot be recreated in this way as mentioned above (3.2). Even larger differences are expected regarding hydrology, in particular due to modern canalization of large rivers. Present-day streamlines are thus not representative for Upper Palaeolithic settings. To account for potential river locations, Late Pleistocene alluvial plains were integrated and treated as possible streamline location when creating the distance to river predictors.
Possible inaccuracies can also be expected from the data representing loess and Late Pleistocene alluvial plains, as these cover a European scale (Lehmkuhl et al., 2021). To minimize scale-related inaccuracies, the mentioned DEM and regional geological map were used as reference for visual validation and adjustment. Nevertheless, only half of the sites were found covered by loess shapefiles. This stands in contrast to the sedimentological context of the sites, as all are embedded in loess. It is therefore safe to say, that loess is underrepresented in geological maps of the study area. This may be caused by the exclusion of sediment covers <2 m during geological mapping (Lehmkuhl et al., 2021).
Some possible limitations of the model are also introduced by the archaeological dataset. On one hand, the dataset does not always allow for unambiguous assessment of potential duration of the sites' occupation (e.g. recurring brief or seasonal occupation vs. longer stay), although this is expected to have an impact on the preferred position within the palaeoenvironment. On the other hand, sites with several occupations spanning up to 25,000 years were treated as one, ignoring possible changes in settlement preferences. The only way to address these issues is to enlarge the dataset, so that it can be split into typeand chronology-specific groups without reducing the sample size below a representative minimum.
Possible sources for sampling bias regarding the site locations should also be considered to assess the representability of the dataset. This is especially important in the context of a loess landscape. As potential archaeological sites are often deeply embedded in loess sequences, discoveries are less likely to be random, but instead often connected to construction, viticulture or quarries. This leads to a significant sampling bias, which was addressed by careful selection of only well-documented and representative sites. However, this reduced the sample size to only 23, leading to difficulties in validation. As such, the model was validated with the training sample at the cost of weakening the statistical strength of the validation result.
Despite these limitations, we are able to show that predictive modelling for Upper Palaeolithic sites in the study area provides plausible results.

Conclusion
We introduce a new methodology for APM, which allows deductive models to profit from inductive analysis. As such, it represents a good addition to the 'Middle Range Theory' in archaeology, breaking up boundaries between deductive and inductive approaches. In detail, the statistical connections between 10 environmental predictors and Upper Palaeolithic site probability in Lower Austria were assessed, using the modelling software MaxEnt. The selected predictors have implications for the settlement choice and the site preservation. Instead of letting the MaxEnt 'black box' run its course, losing track of causality in the process, intermediate results (response curves) were evaluated for plausibility from an archaeological perspective before further processing. All predictors were then reclassified to optimal, viable and nonviable value ranges and combined into a predictive model in an additive approach. The resulting map highlights spatial dynamics both on a larger and smaller scale. The small sample size, however, does not allow for complex validation and raises the question of adequate representability. As such, the model, in its current state, is applicable for scientific applications only. When more empirical data is added and the model is thoroughly validated, an implementation into cultural heritage management is possible.

Software
For mapping, processing and statistical analysis of spatial data, we used ESRI ArcGIS 10.7.1. For the inductive parts of the APM, MaxEnt, version 3.4.4 was utilized. For the post-processing of maps and creation of graphics, we used Inkscape, version 1.1.

Open Scholarship
This article has earned the Center for Open Science badges for Open Data, Open Materials and Preregistered. The data and materials are openly accessible at their helpful comments, which greatly improved the quality of the map and the manuscript.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
All investigations were carried out in the framework of the CRC 806 'Our way to Europe', Projektnummer 57444011 -SFB 806, funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation).

Data availability statement
The complete dataset raised in this study is publically available at the Collaborative Research Centre 806 (CRC806) database (DOI: 10.5880/SFB806.71).