Is Non-destructive Provenancing of Pottery Possible With Just a Few Discriminative Trace Elements?

This study outlines how to perform a fully nondestructive compositional analysis of pottery, but with the limited number of discriminative elements quantifiable by pXRF. The spectrometer was calibrated with fired clay samples spiked with Ti, Fe, Ga, Rb, Sr, Y, Zr and Nb. Methodological outliers resulting from surface modifications and paste heterogeneity were identified by multiple spot analysis per fragment and sorted out. From ten late medieval/early modern potting centres located all over Germany 75 to 360 waster sherds were analysed, resembling plain earthenware, proto-stoneware and glazed stoneware. The 90% quantiles of the assemblages could be distinguished by simple biand trivariate scattergrams of the trace elements, although the overall compositional difference was low. As a general trend paste group distinction of similar wares got less pronounced with decreasing geographic distance of the production sites. With the limited total spread of elemental concentrations, the n≤15 dimensional space of discriminative elements is too small to separate hundreds of paste recipes generated over centuries in a densely populated territory. Thus a restriction to the anthropologically reasonable context is required when compositional provenancing of unknown pottery is undertaken, and the successful resolution of production sites needs to be confirmed on a case by case basis. Statement of significance Apart from cultural heritage and art historical conservation aspects, non-destructive XRF analysis allows for high, though non-automatized throughput of even very small ceramic fragments, which strongly improves the significance and validity of location, ware and style specific reference groups. However, pottery also provides considerable methodological constraints to true non-destructive compositional analysis, and the difference in chemical pattern between production centres is generally small. Diligent instrument calibration and a stringent protocol are therefore prerequisites for distinguishing geographically neighbouring pottery produces, and assemblages with adjacent and partially overlapping chemical composition in general.


Introduction
XRF is a surface analytical method which as a precondition requires homogenous sample material of infinite thickness, larger than the penetration depth of the emitted photons of the heaviest element to be quantified in the sample matrix. This is easily achieved with powdered compounds pressed into pellets, but is difficult to fulfil for study objects kept intact for conservational reasons or for future analyses and re-evaluations. In the case of pottery it is the principle heterogeneity of the fabric, clay and nonplastic minerals, as well as the many intentional and unintentional surface modifications which contradict the requirement of sample homogeneity (cf. Forster et al. 2011;Speakman et al. 2011).
An additional consideration about non-destructive XRF in archaeometry is the easy use of portable or handheld XRF instruments with less diligence than in standard laboratory analytics (Shackley 2010;Speakman et al. 2011;Frahm, 2012;Speakman and Shackley 2013). Meanwhile the debate on accuracy, inter-instrument comparability and independent replicability of geochemical pXRF data is focusing on the factory installed quantification software of the instruments provided by the vendors for the convenience of the user. Obviously matrix issues and concentration ranges of elements to be quantified are too divergent as to serve all geochemical applications with just one general approach -fundamental parameter or empirical factory calibrations integrated into the portable spectrometers (Brand and Brand 2014;Piercey and Devine 2014;Hall et al. 2014;Conrey et al. 2014;Hunt and Speakman 2015). By contrast user specific instrument calibrations (i) rely on reference materials which in its geochemical composition come close to the study material ("matrix match", Hunt and Speakman 2015), and average minor matrix differences by using a large number of standards (Stern, 1976;Rowe et al. 2012); (ii) calibrate with a set of welldefined matrix matched secondary standards (Ferguson 2012); or (iii) apply standard addition or "spiking" of the matrix (Stern 1976and Potts et al. 1984for XRF, Miliszkiewicz et al. 2015 for LA-ICP-MS). Regardless of the calibration concept, however, there may still be remaining uncertainty as long as no sample specific correction of matrix effects and interference phenomena is employed (Conrey et al. 2014), particularly when a level of accuracy is intended which exceeds "internal consistency of data" (Bishop et al. 1990;Speakman et al. 2011).
The theoretical and ethnoarchaeologically supported principles of ceramic chemical sourcing are independent of the analytical method and protocol and follow the provenance postulate (Weigand et al. 1977), though adapted to the ceramic fabric as an intentional modification of its natural plastic and nonplastic raw material components (cf. Neff 2000;Mommsen 2004;Hunt 2012). In light of the high spatial, microzonal variability of raw clays, which for reasons of plasticity are the major constituent of pottery, compositional clustering is only realistic under quite ideal conditions, namely with a minimum degree of standardization and scale of operation in clay sourcing and paste preparation (cf. Arnold et al. 1991;Blackman et al. 1993;Tite 1999). Furthermore the local sources themselves, recovered by the potters or household members in the case of nonspecialized production, need to assert a minimum of compositional coherence and size to allow for a continued exploitation over an extended period of time, thus creating site specific trace element patterns in the ceramic remains. Finally compositional provenancing can only be successful if the local raw material patterns and resulting composite fabrics are sufficiently distinct from all other potential provenances of similar wares in a comparative time period. Intentional tempering further diverges the resulting pastes from the chemical composition of the raw clay sources in its geological background (cf. Neff et al. 1988;Sterba et al. 2009), thus further emphasizing the empirical, case by case nature of successful ceramic provenancing.
We have challenged the methodological approach of a fully non-destructive analysis, but with the limited set of discriminative trace elements accessible through pXRF, by establishing compositional reference groups of ten medieval and early modern pottery production centres. The sites and wares have been selected to cover a broad range of geographic distances in modern Germany, and to encompass plain as well as various surface modified wares. A preliminary version of this paper with less elaborate instrument calibration was presented as a poster at the ISA 2014 in Santa Monica, CA.

Instrumentation
A Bruker Tracer III-SD portable XRF spectrometer with rhodium tube and a 10 mm 2 XFlash® SDD silicon drift detector has been employed. The typical resolution of the detector is 145 eV at 100,000 cps. The instrument was installed in a desktop stand with the instrument window in an upside position and the sample placed on top of the window (window film Prolene® 4.0 µm). The effective spot size of the sample surface from which the emitted photons are collected by the silicon drift detector was experimentally determined at about 50 mm 2 (beam collimator width generating a spot cross-section of 8 mm). Spots were selected for visual criteria (as flat as possible areas of the sherds and absence of visually detectable surface modifications). Samples were manually positioned on the instrument window. Large fragments in an unbalanced spot position and samples in unstable broken edge position were experimentally fixed with small counterweights, clips or holders. If not otherwise indicated 2 to 4 spots per sherd were analysed for 120 sec each. For measuring minor and trace elements, K to Nb, the tube power was set at 40 kV and 10.3 µA with a 25 µm Ti/ 300 µm Al beam filter.
Signal acquisition was performed with the Bruker S1PXRF 3.8.30 software operated with PC trigger, and was simultaneously checked by spectral overlay for surface modifications and paste homogeneity. For background stripping and Bayes peak deconvolution of spectra the proprietary Bruker deconvolution software ARTAX 7.4.0.0 was used. Calibration factors for Ti, Fe, Ga, Rb, Sr, Y, Zr and Nb were determined with tenfold replicate measurements of spiked fired clay samples, and corrected with element specific influence coefficients for Fe absorption (Ga, Rb, Sr, Y, Zr, Nb) or enhancement (Ti) and for spectral interference (Y, Nb), as outlined below. All other calibration factors were determined based on the Bruker factory calibration with pressed pellets of the mudrock samples established by Rowe et al., 2012, but [Mongolia Central Geological Laboratory]). All mass fraction data for elements calculated without spiked calibration are indicated as information values (i). For distance correction of curved pottery fragments all net counts were ratioed to the Rh Ka1,2 Rayleigh scatter corrected for Fe absorption (deconvoluted peak area under the region of interest of 19.8-20.5 keV). All concentration data are in w/w % of the elements. Statistical and graphical data evaluation was conducted with Excel, R and ggobi.

pXRF instrument calibration
Attempts to validate the factory installed geochemical trace element calibration revealed considerable discrepancies between the expected and the measured concentrations, e.g. for Zr. Since the reliability of certified reference values for Zr had already been questioned before (Potts et al. 1990) a commercial clay powder (99.8% <63 µm particle size) was spiked with increasing amounts of zirconium IV oxide powder, 5 µm particle size, and fired for 2 hrs at 800-850 o C which improves its handling stability and makes it fully comparable to pottery (cf. Hunt and Speakman, 2015 for mineralogical differences between rocks, sediments, unfired and fired clays). The concentration of the spiked elements in w/w % was calculated from the amount of chemical input per amount of clay powder corrected for loss on ignition (LoI). The numerical baseload concentration of the blank was determined as the intercept of the least squares regression line with the y-axis.
In Fig. 1a and b deconvoluted Zr net counts ratioed to the Compton scatter (18.2-19.8 keV) are plotted vs. the given concentration values of reference materials and the fired clay samples spiked for Zr (4 spiked concentrations of 0.0087-0.0641% elemental Zr, blank concentration of 0.0315% Zr). Compton ratioing is the conventional approach for matrix correction in silicate dominated minerals, where Fe is the major variable element causing matrix effects (absorption for elements with atomic number higher than Fe, and enhancement for elements with lower atomic number; Reynolds 1963). The coefficient of determination of the Compton ratioed Zr net counts of the reference materials of 0.95 is low (Fig. 1a) and even dropped to 0.84 for Y (data not shown), thus explaining why the verification of factory calibrations with single certified reference materials typically fails although its calibration factors might actually be quite correct. Since the measured values scatter around the regression line, the deviations are not just explainable by the different sample preparation modes. Whether the positive or negative deviations from the regression line were due to any remaining matrix effect or to uncertainties in the given values of the reference materials, or both, is still unknown. Although the Zr spikes basically confirm the calibration factor established with the reference materials, its standard error is much lower (± 6 ppm vs. ± 25 ppm), thus considerably improving the intrinsic accuracy of the calibration.
Afterwards fired clay samples spiked with all those additional minor and trace elements were prepared which were inferred to be discriminative for provenancing from a preliminary analysis of the waster assemblages, namely for Ti, Fe, Ga, Rb, Sr, Y and Nb. It is assumed that the clay matrix used for spiking is broadly compatible with most ancient pastes and pottery wares since the X-ray mass absorption coefficients of O, Al and Si, which together amount to ≥90 w/w % of the paste, are almost identicalat least from the perspective of the mid-Z elements, which are most important for provenancing.
The Fe spiking series was used to validate the Compton ratioing for Fe absorption on an element per element basis. Attenuation of the Zr fluorescence in the fired clay matrix by increasing Fe abundances is shown in Fig. 2a. Apparent absorption of Zr photons by Fe follows a polynomial curve which has Figure 1a (left) Zr regression line of Compton ratioed reference materials (green squares, mudrock samples, red squares, certified reference materials), and standard error of the slope. Figure 1b (right) Regression line of the Zr spiked fired clay samples (black squares), reference materials superimposed for gross validation been converted into a straight line parallel to the x-axis with a Lachance-Traill type algorithm (Lachance 1993, equation 41). The resulting empirical influence coefficient, Zr net cts × (1 + 0.8 x10 −6 × Fe net cts ), which is specific for instrument setting and concentration range of the analytes, was applicable to several clays which have been tested, covering a total elemental Fe concentration range of 0.5 to 8%. Fig. 2b illustrates the Fe attenuation of the Rh Compton scatter in the same experiment. The empirical influence coefficient for the Compton scatter, Rh Compton net cts × (1 + 1.0 x10 −6 × Fe net cts ) is in the same order of magnitude, but differs slightly from the numerical value of the Zr term. Whether these small discrepancies, which have also been observed for the other mid-Z trace elements are real or just reflect measurement uncertainties at low counting rates is unknown. By contrast apparent Fe absorption of the Rh Rayleigh scatter is much lower (Fig. 2b, Rh Rayleigh net cts × (1 + 0.26 x10 −6 × Fe net cts ). Thus rationing to the Rayleigh scatter does not properly correct for Fe absorption. The empirical influence coefficient for Fe enhancement correction of Ti was determined as Ti net cts / (1 + 0.23 x10 −6 × Fe net cts ).
Although the ARTAX 7.4.0.0 deconvolution software applies some proprietary correction algorithms for escape and sum peaks as well as for spectral interference, it was recognized from the spiking series of Rb and Y, that the baseload net counts of Y and Nb were strongly impacted by the increasing concentration of the respective spiking element. This is obviously due to an insufficient overlap correction of the Y Ka peak by the Rb Kß peak and of Nb Ka by Y Kß in the specific concentration range of the tested clay samples. In the case of Y the net counts were underreported with increasing Rb concentration (Fig. 3a), whereas the Nb net counts were overreported in the presence of higher Y values (Fig. 3b). Both uncompensated spectral interference effects were corrected with Lachance-Traill type influence coefficients. There is also spectral overlap of the Zr Ka peak by Sr Kß, which, however, due to proper correction by the ARTAX software, was insignificant in the concentration range of Sr which was found in the archaeological inventory. Sample morphology, physical matrix properties and infinite thickness Fluorescence loses its energy with the inverse square of distance from the detector (Lambert-Beer law). Tight, flush positioning of the sample on the instrument window is therefore essential. The calibration samples have a flat surface, whereas most pottery sherds have a curved profile with a convex outer and a concave inner wall side. Likewise broken edges, chips and spallings frequently have an irregular surface morphology. For the mid-Z trace elements this inevitable distance discrepancy can be corrected through normalization to the X-ray source scatter, the Compton or the Rayleigh scatter peak (Potts et al. 1997).
When comparing the performance of Compton and Rayleigh scatter normalization with a fired clay block with flat surface and infinite thickness at 0 to 3.3 mm distance from the instrument window, Rayleigh scatter rationing gave a complete distance correction for all tested elements from Ga to Nb. By contrast distance correction with the Compton scatter was less effective exhibiting a compensation gap of 5% per 1 mm distance (Supplemental Data). We therefore decided to use the element specific Lachance-Traill algorithms for the correction of Fe absorption in the clay matrix, and the Rayleigh scatter for distance correction (using both Compton and Rayleigh scatter ratioing simultaneously would result in an overcorrection of distance variation).
Since it was argued that Compton ratioing is essential for mass correction in non-destructive XRF analysis (correction of "practical" matrix effects, Shackley 2011) the performance of the Compton and the Rayleigh scatter peak for normalizing w/w (LoI), w/v (density, micro-porosity) and micro-crystallinity differences was checked in air dried unfired and in fired clay samples, covering both medium (800-850°C) and high fired pottery (1150°C). This is important if assemblages containing regular earthenware, high fired earthenware and fully sintered stoneware are analysed, and since medieval waster material typically consists of overfired fragments. Though Compton ratioing was slightly superior (Supplemental Data), it is assumed that the difference does not outweigh the advantage of Rayleigh ratioing for proper distance correction in non-destructive paste analysis.
Pottery without added coarse temper often has a wall thickness below 4 mm, in the case of high fired earthenware and stoneware occasionally going down to as thin as 1.5 mm. As was pointed out by Davis et al. (1998) in their evaluation of obsidian minimum sample thickness, infinite thickness is matrix specific and increases with the energy of the incident X-ray by the atomic number of the element, but including the 45 Rh Compton and Rayleigh scatter intensities. Infinite thickness was determined with clay samples spiked with 0.0100% 41 Nb as the heaviest mid-Z trace element analysed in the study and fired at 800 o C, at a sample thickness of 1.5 to 12.0 mm. At a thickness below 3 mm the net counts of the mid-Z elements did no longer fully represent the elemental concentration in the paste, with the Rh Compton and Rayleigh scatter intensities exhibiting the largest negative deviation, 12 and 14%, respectively (Supplemental Data). The lighter elements Ti, Fe and Ga with their significantly lower infinite thickness, however, were still unaffected down to 1.5 mm sample thickness. When infinite thickness is undercut, uniform ratioing to the Compton or the Rayleigh scatter thus overreports the net counts with decreasing atomic number of the elements, whereby the respective behaviour of the mid-Z elements is not fully proportional to its atomic number (Davis et al. 1998).
In the absence of a proper correction algorithm for low thickness samples a practical approach is to place a small pure quartz pebble on top of a thin sherd, substituting the missing background mass for the two source scatters which then stay unaffected and can be properly used for ratioing (cf. Markowicz 2008 and Ferguson 2012 for similar approaches). For obsidian Davis et al. (1998) have determined the minimum sample size at 1.2 mm, thus lower than what was found for the 800°C clay firings. This may be due to the glass matrix of obsidian, and for clay samples sintered at 1200°C the apparent minimum thickness got down to 2 mm.

Surface modifications
In the case of true non-destructive analyses any intentional or post-depositional surface modification which covers the underlying ceramic body, may falsify the compositional analysis of the fabric. Simple Fe paintings and thin slips typically did not impact the peak size of the mid-Z elements, nor did modest soil contaminations with Ca, Fe, Mn and Zn ( Fig. 4a and b). Multiple spot analyses of outside surfaces frequently varied in K, Ca and Mn values, which is probably due to unintentional wood ash/potash deposition during firing. These minor variations, depending on the intensity of deposition and the inorganic composition of the fuel material, result in increased relative standard deviations of the respective light elements in the particular sherd and in the entire reference group, but do not impede the accuracy of the concentration values of the discriminative mid-Z elements. Depending on the soil chemistry, Ca contaminated surface areas, though still insignificant with respect to photon attenuation, exhibited an increased Sr concentration (Fig. 4c). Similarly the intentional dosage of potash during firing as was frequently practiced in the production of proto-stoneware, as well as thin K and Ca containing slip and salt glazes, may exhibit increased photon intensities of their geochemical proxies Rb and Sr as compared to unglazed areas of the same sherd (Fig. 4d). Post-depositional modifications typically differ in layer thickness and thus give different net counts per analysed spot. Spectra with consistently low peaks of soil elements therefore reveal "clean" surface areas. Significant intra-sherd variability of Ti, which as a light element is more sensitive to surface modifications than the mid-Z elements, is an indication of considerable surface sealing, and can be used as an additional criterion for identifying methodological outliers ( Fig. 4b and d). In the unlikely event that both sides of the sherd and all broken edges and chips are equally covered by a thin, invisible deposit, such surface modification, however, would not be detectable without surface abrasion and thus be overlooked. Differences in net counts between surface modified measurement spots and assumed clean paste areas are qualitative, since they are not covered by the actual calibration which is based on homogenous elemental distribution in a fabric of infinite thickness. Moreover Fe surface layers exhibit a lower Fe absorption per net count than of paste distributed Fe (experimental data not shown). Surficial Fe absorption correction, whether by Compton ratioing or with an element specific influence coefficient, would thus overreport the mass fraction of the mid-Z trace elements. Datasets with significantly increased Fe net counts compared to other measurement spots of the same fragment have therefore been excluded as methodological outliers.

Precision and homogeneity test
As a surface analytical method compositional paste analysis with XRF depends on the homogeneity of the sample, which can be checked by multiple spot analysis on both sides of the sherd, and where possible on broken edges. For an exemplary evaluation of paste homogeneity an undecorated, untempered body sherd of a globular Gottsbueren pot of 8 × 5 × 0.4 cm size was selected, and 20 repeated measurements of the same spot on the convex outside as well as 10 different spots on both the convex outer and the concave inner wall of the sherd were acquired ( Table 1). As expected the relative standard deviation in % (%RSD) increases with decreasing peak intensities (cf. Speakman, 2012) with Ga, Y and Nb concentrations close to detection limit. Ratioing to the Rayleigh scatter as part of the conversion of net counts into concentration values compensates for morphological surface irregularities, varying distance from the instrument window/detector and any minor, but unintended variability in the instrumental conditions.
Subsequently the same experiment was repeated with a second TRACER III-SD instrument also calibrated with the spiked clay samples. The %RSD per element over all variable spot measurements with both instruments (n = 40) can be taken as a realistic estimate of the cumulative counting and quantification error in non-destructive pXRF analysis of undecorated, untempered pottery fragments. For the exemplary Gottsbueren sherd #102 this uncertainty amounts to ±2.5 to ±10% depending on the absolute concentration of the respective element, whereby the Y values exhibit the largest relative standard deviation as they suffer from low counting statistics and the additional uncertainty from the Rb Kß peak overlap correction.

Intra-site variability
With a %RSD of 2.5 to 10 per element and sherd which in case of doubt still underestimates the cumulative uncertainties (cf. Stanley and Lawie 2007), the question arises whether the level of intra-sherd variability impedes the purpose of pottery provenancing. Therefore additional 9 sizeable Gottsbueren sherds were selected and 10 different measurements per sherd, from each 5 different spots inside and outside, were acquired and the %RSD per sherd was compared with the entire Gottsbueren assemblage (n = 360, 2 to 4 measurements per sherd, Supplemental Data). Fig. 5a and b display bivariate scattergrams of Zr, Sr and Rb. There is one strong outlier in the Zr plot, #176-ou4, which was already visibly detectable and most probably results from a vitrified kiln wall drop. #102-in5 is a modest Zr outlier probably due to a local Zr anomaly, e.g. a large zircon grain (the arbitrarily selected #102 spots are not identical with those of the previous analysis in Table 1). Except for the two sherds with Zr outliers and the Nb values, which are surprisingly consistent, all %RSD of single sherds are significantly lower than the %RSD of the entire Gottsbueren assemblage, even for the Y mass fraction of many sherds. In the scattergrams the data points of the individual sherds form sub-clusters distributed over the entire cluster area. Thus it can be excluded that the scattering of the cluster is a mere consequence of intra-sherd variability, whether due to compositional heterogeneity or cumulated analytical uncertainties.
The two bivariate scattergrams exhibit a dense core and numerous dispersed data points in the outskirts. These data points, when consistent for all measurement spots of the same fragment, represent true paste outliers resulting (i) from cracked imported vessels discarded in the same pit or dump, which are typically found in waster assemblages with a frequency of 1% or less; or (ii) from occasional paste variants produced at the site. The term paste outlier implicates that the relative amount of these sherds is small and that they do not establish a separate group or subgroup. By contrast a single data point deviating from the other measurement spots of the same sherd represents a methodological outlier. All three kinds of outliers can be screened for and manually excluded. For establishing reference groups with a thousand or more datasets, however, it is more convenient to check the assemblages first for sub-clusters, and provided that the distribution is homogenous, to restrict the group files to the 90% quantiles per element, equivalent to the 90% confidence interval of data with Gaussian distribution. For the sake of clarity, methodological outliers among samples of unknown provenance need to be identified sherd by sherd and are manually deleted in the Excel file before pattern allocation.

Intact vs. powdered sample material
Heterogeneity of fabrics is of general concern in compositional analysis. As was discussed by Henrickson & Blackman (1992) for large hand-built vessels potted from several kg of paste, homogeneity and closely related mass fraction data from multiple samplings taken over the entire size of the vessel can only be expected if the respective batch of raw clay, with or without added temper, was carefully homogenized by levigation, slurrification and/or kneading. Provided there is sufficient mixing of all submicroscopic and larger particulate constituents of the fabric, it depends on the sample size/volume in XRF analysis, but not so much on an additional mechanical micronization, whether an analytical problem arises from such structural heterogeneity. If particles are unevenly distributed or form a gradient, if the particle size strongly differs, or if there are only a few large particles per sherd, the sample size matters even more. Thus µ-XRF spot size (Speakman et al. 2011) and micro-destructive laser ablation sampling is more  susceptible to heterogenous distribution of granular inclusions than pXRF with an effective spot size in the range of 50 mm 2 (collimator for 8 mm spot cross-section). Compared to any destructive methods nondestructive XRF has the advantage that multiple spot analysis is feasible, and that the data can either be averaged, or an anomaly can be screened out. A peculiarity of non-destructive paste analysis is the positional effect of mid-Z anomalies like a single large zircon grain, whether part of the natural nonplastic phase of the clay or introduced with the temper. In a ground sample a high Zr value will be found, but the anomaly would not be detected unless a second sample was taken. In non-destructive XRF, when analysing the same spot from both sides of the sherd, the same elevated Zr concentration will only be obtained if the zircon grain is placed exactly in the centre plain of the sherd. Otherwise, and this is the more likely situation, due to matrix absorption two different elevated Zr values will be found which looks irritating, but is fully in line with X-ray physics.
As shown above such Zr anomalies have been observed in 2 out of 10 large Gottsbueren sherds, but only in 2 out of 100 spots which were analysed. Keeping in mind the random variation in replicate analyses and the complications arising from surface modifications and paste heterogeneity, a "fully quantitative" non-destructive pottery analysis is only possible with extraordinary efforts and diligence (cf. Hunt & Speakman, 2015). This means replicate signal acquisition from multiple spots per sample and subsequent data purging which for most research tasks, however, would go beyond the scope. By contrast the minimum number of measurements per sherd, whether of known or unknown provenance, which qualifies for representative sampling is two coincident spectra.

Compositional analysis
Waster material from nine medieval and early modern German pottery production centres and one adjacent Belgian site was then analysed as a methodological  stress test (Table 2 and Fig. 6). The sites have been selected without immediate archaeological significance, but to get an understanding of the intra-site variability and the total inter-site spread in the study area of those pXRF accessible minor and trace elements, which for reasons of interference or depending on the specific instrumental setting may not have been routinely quantified and/or used by NAA, namely Ti, Ga, Sr, Y, Zr and Nb (e.g. Mommsen et al. 1995and Schifer 2003with Rb and Zr, and Hook 1997 with Rb as the sole NAA elements matching with pXRF). Except for coarse Marienthal proto-stoneware (Fig. 8b) there was no indication that any of the assemblages was intentionally tempered. Typically earthenware was reductively fired with grey to blackish fabric and surface appearance, and frequently overfired thus giving the impression of a proto-stoneware like highfired ware (Fig. 7). Outer walls of proto-stoneware fragments were either unglazed, or showed a thin ash deposit or an intentional potash glaze, respectively. Stoneware was unglazed or exhibited an orange, brownish or grey glaze of uneven thickness (Fig. 8).
Visual distinction between slip and salt glazes is often difficult, and Na as a light element is not detectable by pXRF under normal atmosphere. Thin-walled sherds with a complete inside and outside slip or salt glaze which do not give access to the paste without surface abrasion, have been excluded from the study. All sherds with the exception of the Raeren material, which was relocated from the village centre to a nearby forest road, derive from surface collections in the suspected or proven workshop areas and were accompanied by clear waster indices like misfirings, broken kiln furniture, kiln wall fragments and charcoal. Intra-site concentrations of K, Ca, Mn, Fe, Zn and Pb strongly scattered, reflecting intentional, unintentional and post-depositional surface modifications. By contrast, intra-site %RSD of Ti, Ga, Rb, Sr, Y, Zr and Nb was most often below 20 (Table 3), designating these elements as candidates for inter-site compositional discrimination. Plotting element ratios and thus trying to compensate for nonplastic dilution of the clay matrix did not significantly improve the graphical resolution of the reference groups. There were no explicit sub-clusters within the tested assemblages, and no obvious correlation between either of the discriminative elements. Seven sites could be separated from each other in bivariate plots of the 90% quantiles of Rb, Sr and Zr, Grossalmerode, Kroening, Marienthal, Brueggen-Oebel, Raeren, Siegburg and Gottsbueren ( Fig. 9 and Supplemental Data). Two pairs of sites, Coppengrave and Raeren, as well as Bengerode and Waldenburg overlapped in each of the bi-plots though they were geographically unrelated and stylistically distinct. These assemblages could be distinguished from each other based on its Y and Nb concentrations, respectively ( Fig. 10a and b). Three medieval pottery production centres, which are located on both banks of the Weser River in a distance of 30 and 50 km from each other partially overlapped in several bi-plots: Gottsbueren, Coppengrave and Bengerode. By contrast, the assemblage of the fourth regionally important production site, Grossalmerode, located some 40 km south to Gottsbueren, was clearly separable by its higher Ga and Sr mass fractions (Table 3). In a Principle Component Analysis (PCA) with Ti, Ga, Rb, Sr, Y, Zr and Nb as the variables the three overlapping sites could be almost but not completely separated (Fig. 11). As shown in the abstract graph the PC plot did not yield a better resolution than the trivariate plot of Zr, Y (primary PC1 loading) and Sr (primary PC2 loading). This is in line with the observations of Michelaki & Hancock (2011) that with a low number of discriminative variables PCA is of limited advantage. However PCA turned out to be quite helpful for the pre-structuring of pottery sherds from domestic contexts with multiple potential provenance.
The Gottsbueren assemblage itself (n = 360) consisted of waster material from three different workshop areas (Stephan 1982), each of them in a distance of about 2.5 km, the deserted settlement of Bensdorf at the Fulde creek (n = 159), the deserted site of Thonhausen at the Donne creek (n = 150) and a workshop in the village centre of Gottsbueren (n = 51). The trace element patterns of the waster fragments from the three workshops completely overlapped. The compositional result is thus ambiguous whether the workshops at the three different locations used the same clay source or whether any locally separated clay mines are compositionally indistinguishable with the set of trace elements quantified by pXRF analysis.
The bivariate density areas, as shown above, varied in shape from almost circular to ellipsoid and irregular. As indicated by the Q1 and Q3 tails in boxplots of the 100% quantiles (Supplemental Data) there was frequent overlap in the periphery of the clusters before restricting the group files to the 90% quantiles.

Number of discriminative elements
The number and kind of diagnostic elements is clearly dictated by the analytical method and its sensitivity, since many trace elements in fabrics are Science and Technology of Archaeological Research 2016 VOL 2 NO 2 present in the low or sub ppm range (for useful elements in ceramic provenancing and its potential geological precursor minerals cf. Padilla et al. 2006;Degryse and Braekmans 2014). As shown with the exemplary waster assemblages and known from most other provenancing studies, there is considerable intra-site diversity, but limited inter-site differentiation in the trace element patterns of ceramic fabrics. Seven minor and trace elements have been selected, Ti, Ga, Rb, Sr, Y, Zr and Nb, which were found to be discriminative based on its comparatively low intra-site variability (none of the studied fabrics exhibited a detectable Th concentration). Sometimes Fe can be added to the list, if potters have intentionally selected different clays for wares with different technological properties. For certain wares even Si and Al may be applicable, but otherwise the major elements have little discriminative value, and have not even been measured on a regular basis (determination of Al and Si is semiquantitative, since with pXRF under internal vacuum a distance correction for the low-Z elements is not available and matrix correction is difficult). In other territories with different soil chemistries and different human agency elements like Mn, Co, Cu, Zn and Pb may also exhibit some discriminative potential. It was supposed that when a large number of elements, 20 to 25, are measured with sufficiently high precision, as possible with NAA, the elemental pattern of a clay deposit has a very high probability of being unique in the world (Mommsen 2001). From a statistical probability point of view this is certainly right, as far as clay deposits and pottery produces are geographically and thus geochemically completely unrelated, and provided the spatial variance in the respective deposits is exceptionally low. On the other hand trace element abundances in clays and  pottery fairly well match the average trace element composition of the upper crust and the terrestrial surface (Rudnick 2005;Hartmann et al. 2012), at least indirectly explaining the low spread of concentrations in a certain territory or even worldwide. Hence if destructive sampling is acceptable, NAA is the preferred analytical method as was demonstrated by Speakman et al. (2011) in a direct comparison of the resolution potential of NAA and pXRF with eight well characterized compositional groups of Mimbres pottery from the American Southwest. As shown for the Mimbres assemblage, the pXRF element set including Ti, Mn, Fe, Rb, Sr, Y, Zr, Nb and Th, even after prior surface abrasion for the non-destructive analyses, turned out to be far less appropriate for unambiguous group distinction than the NAA elements Cr, Cs, Eu, Ta and Th, which are well established for Mimbres compositional discrimination. Since there is only a small overlap between the discriminative elements quantifiable by NAA and pXRF the effective compositional distinction of pottery assemblages by either technology is not predictable, but must be empirically demonstrated on a case by case basis. Fig. 12 displays the comparison of two independent assemblages of Waldenburg stoneware waster material, analysed with pXRF (this study, Table 3) and NAA (Schifer 2003). The discriminative elements of the NAA study had been determined by Schifer as Sc, Cr, Rb, Cs, La, Ce, Eu, Lu, Hf, Ta, Th and U. Further elements were quantified, i.a. Zr, but not used for grouping. Thus there were only two trace elements with discriminative potential, Rb and Zr, which were matched by both analytical methods.
In the Zr/Rb scattergram (Fig. 12a) the pXRF data (n = 75) fall into the NAA group 1 (n = 55), thus confirming both the relative accordance of the Zr and Rb quantification by the two methods as well as the diversity of the compositional group. Group 2 (n = 8), which is separated from group 1 in the Zr/ Rb plot, was obviously not present in the pXRF study material. Group 3 (n = 7) and 6 (n = 3) are characterized by higher rare earth element concentrations as shown in the Ce/Eu bi-plot (Fig. 12b), the latter exhibiting considerable correlation in the fabrics. We have performed a PCA analysis of the total NAA datasets (n = 73) with the published mass fractions of the elements listed above, plus Zr (Fig. 12c). Using Discriminant Analysis, Schifer achieved a better separation of group 2 and 3 from the major group and even found three subgroups within group 1, which however fused when Figure 9 Sr/Rb scattergram of waster assemblages from 10 medieval/early modern pottery production centres; 90% quantiles (for colour code see Table 3)  mathematically corrected for temper dilution. Similarly it cannot be excluded that the groups with n = 3 to n = 8 observed by Schifer may coalesce with the major group if more samples would be tested. As a result of the NAA study Schifer could assign 44 foreign sherds, which were found in Saxon towns Leipzig and Dresden, but also as distant as Estonia, to the Waldenburg groups, thereof 2/3 to group 1 (Fig. 12d). The co-plotting of the pXRF datasets illustrates, that in the particular case the allocation of unknown sherds to the Waldenburg group 1 would also have been succeeded with just the Zr and Rb mass fractions, though separation from other potential stoneware production centres in the territory needs additional discriminative elements like Ga and Nb, which exhibit elevated mass fractions in the Waldenburg fabrics (Table 3, Fig. 10b).
Conclusions pXRF analysis of pottery still suffers from poor or hardly to be validated instrument calibration which may be overcome by a careful selection of matrix matched reference materials and with spiked fired clay samples in particular. Proper calibration including Figure 12 Data comparison of Waldenburg waster material -pXRF data (90% quantiles, ), NAA data (Schifer 2003), Waldenburg group 1 ( ), group 2 ( ), group 3 ( ), group 6 ( ) and foreign finds of unknown provenance ( ); (a) Zr/Rb scattergram of waster material, (b) Ce/Eu scattergram of Waldenburg groups 1-6, (c) Principle Component Analysis of NAA data for 13 trace and rare earth elements, correlation matrix; (d) Zr/Rb scattergram of Waldenburg wasters and unknown samples correction of matrix effects and spectral interferences seems necessary to generate accurate compositional data which are verifiable, comparable and applicable for subsequent studies by other scholars. Non-destructive XRF analysis of pottery is an experimental technique: spot selection and sample positioning cannot be automatized, and methodological outliers, resulting from surface modifications and paste heterogeneities, must be screened out. Pottery assemblages are characterized by high intra-site variability, but low inter-site differentiation. Regardless of the analytical method, whether destructive or nondestructive, it cannot be expected that thousands of paste recipes in an extended geographical area can be distinguished with just a handful of trace elements. On the contrary, in a densely populated territory with abundant small, heterogenous clay deposits, eventual use of tempering materials and intensive pottery production over centuries, elemental patterns of ceramic wares from different sites will frequently overlap. Thus NAA with up to 25 potentially discriminative elements is clearly advantageous provided that destructive sampling is acceptable.
Compositional ceramic analysis intends to provenance artefacts from settlement contexts of similar stylistic appearance, but uncertain production origin. With the limitations of an n ≤ 15 dimensional space of discriminative elements accessible by pXRF, it is not very likely to separate more than 10-15 reference groups at once by numerical processing or by graphic display. Rather a preselection of potential reference groups must be employed focussing on the relevant time period, geographical region and ceramic ware. In many archaeological situations waster material is not available, and what is local produce must be inferred from the relative abundance of pottery wares (Bishop et al. 1982), particularly utilitarian equipment which with the exception of transport vessels is thought to be procured by preindustrial residents from the nearest environs. For assigning unknown sherds of domestic context, which is plain earthenware in the case of late medieval and early modern Germany, production centres in a radius of up to 30 km may be considered, typically resulting in less than 10 archaeologically confirmed candidate sites. If established regional reference groups overlap and if the unknown fragments fall into the intersection, its assignment is necessarily ambiguous. Sherds which fall outside of the 90% quantiles of reference groups and do not create a new independent cluster stimulating further archaeological search, may be assigned to an established reference group by Mahalanobis or similar distance analyses, or are simply excluded from any pattern allocation whatsoever. For speciality wares like stoneware, long distance trade in north-western Europe is known and all potential production centres in the extended territory must be considered. For late medieval and early modern stoneware, dated between 1400-1700, these are about 10-15 production centres in central and western Germany, which have been demonstrated to be reasonably distinguishable with the trace element set accessible through pXRF (Rauch et al. 2016).
Ideally a sherd or vessel of unknown provenance can be reliably assigned to an established reference group, but assignment to the location of origin would still be hypothetical as long as there is no complete inventory of all past production sites in the geographical area of interest. For many regions of the world this is an unrealistic task. Therefore the hypothetical nature of any provenance assignment based on elemental pattern recognition must always be emphasized.

Supplemental data
Supplemental data for this article can be accessed at 10. 1080/20548923.2016.1209030