Taxonomic resolution and treatment effects – alone and combined – can mask significant biodiversity reductions

ABSTRACT Taxonomic resolution or uncertainty poses an important problem in biodiversity research. Assessment of biodiversity at the species level is most informative and preferred, but requires effort and expertise. Alternatively, researchers often bin species into higher taxa because they are unable to recognize them, or to save money and time. Here we analyse, by simulation and analytical modelling, the combined effects of dose-dependent mortality and taxonomic binning on biodiversity indices for a fictitious community of organisms. We asked (1) how binning species in a sample into higher taxa significantly affects biodiversity measures, and (2) whether dose-dependent mortality effects, alone or in combination with taxonomic uncertainty, are duly captured by classic biodiversity indices. Our study shows that haphazard binning into various taxonomic levels is legitimate and preferable to orderly binning (all taxa binned at the same level), because it provides the best resolution. We further show that binning will regularly obscure statistical detection of biodiversity differences, if only due to scaling of mean and variance. Also, treatment effects in combination with taxonomic uncertainty can introduce estimation biases of at times complex nonlinear and non-intuitive nature under any taxonomic resolution scenario, potentially including relative increases in the biodiversity index when intuitively decreases would be expected. We recommend being specific about the expected qualitative and quantitative effects of any treatment or natural comparison before formulating a hypothesis regarding biodiversity reductions. Our theoretical study should aid in this endeavour. EDITED BY Isabelle Durance


Introduction
Studying the dominant biodiversity patterns and the drivers giving rise to these patterns is a core goal of whole-organism biology. Biodiversity assessments, reflecting variation at different levels of biological organization from genes to ecosystems at various spatiotemporal scales, are not only central to evolutionary biology, biogeography and (community) ecology (Margalef 1963;Connell & Orias 1964;MacArthur 1965;Pielou 1974), but are also widely used in fields of applied relevance such as conservation biology or (eco-)toxicology (e.g., Balmford et al. 1996aBalmford et al. , 1996bPuniamoorthy et al. 2014). The essential argument is that greater biodiversity is perceived by humans as indicating more pristine and 'natural', often more stable and essentially 'better' habitats (Hooper et al. 2012;Collen & Nicholson 2014), and that disturbance, be it natural in terms of, e.g., climate change or mediated by human activity in terms of habitat destruction or pollution, typically reduces biodiversity because at least some species might go locally or even globally extinct. Numerous estimators of biodiversity exist in the literature and are extensively used, the most prominent being species richness (i.e., simply counting the total number of species present at a site) and the Gini-Simpson or Shannon-Weaver diversity indices (defined below), additionally taking into account the relative abundances of all the species present (reviewed by Gotelli & Chao 2013).
Though common and applied widely, the quantitative assessment of biodiversity is no trivial task and not without problems. First and foremost, the determination of all or at least the majority of organisms present at a particular site to be monitored is timeconsuming and therefore costly, requiring expert knowledge due to the huge number of species existing on earth. Varying detection probabilities of species (producing false negatives) and false identifications (false positives) are additional statistical issues recognized in diversity studies (Yoccoz et al. 2001). A second, more subtle difficulty relates to the hierarchical nature of the phylogenetic relationships among organisms and our Linnean taxonomic system. Ideally, we should assess biodiversity at the lowermost, most detailed level possible, as this CONTACT Wolf U. Blanckenhorn wolf.blanckenhorn@ieu.uzh.ch Present address: Claudio Bozzuto, Wildlife Analysis GmbH, Oetlisbergstrasse 38, CH-8053 Zurich, Switzerland The supplemental data for this article can be accessed here.
conforms to the highest degree of ecological and genetic differentiation and thus conveys maximum information. Conventionally, this corresponds to the taxonomic species level, although species are often subdivided further into differentiated populations. However, it is well known that classifying taxa into predefined discrete taxonomic units (species, genus, family, etc.) following Linné's Systema Naturae (1758) is somewhat arbitrary, because the degree of genetic relatedness and/or ecological differentiation of any two sister species of, e.g., flies may vastly differ from that of two sister species of birds. Thus, the degree of quantitative differentiation of one taxon at the genus level may well be similar to that of another taxon at the species or even population level. Some researchers may take this inherent arbitrariness as an argument for basing biodiversity assessments on higher taxonomic levels (e.g., genus, family, etc.) or even mixes. That higher taxonomic levels might actually be sufficient for assessing biodiversity has been realized early on, and discussions of this topic have recurred in the literature regularly in different guises, with different focus, and under different names (taxonomic uncertainty, surrogacy, sufficiency, or resolution; e.g., Wu 1982;Herman & Heip 1988;Andersen 1995;Balmford et al. 1996aBalmford et al. , 1996bBaldi 2003;Bevilacqua et al. 2012;van Rijn et al. 2015). To avoid confusion with other meanings of taxonomic uncertainty (e.g., Yoccoz et al. 2001), we here use the term 'taxonomic resolution'. Besides the simple fact that researchers often cannot identify especially small organisms to the species level, using higher taxonomic levels as surrogates for the 'ideal' species level carries not only advantages but also disadvantages, reflecting a trade-off. Clearly, information is lost, e.g., about important underlying biological functions and causal interactions (Verberk et al. 2013), when merely identifying to higher levels, but time and money are gained. Especially if the degree of taxonomic resolution obtained with much less effort is sufficient for, say, guiding policy decisions based on biodiversity assessments, such an approach should clearly be preferred (Balmford et al. 1996a(Balmford et al. , 1996bBevilacqua et al. 2012). That the discussion keeps recurring in the literature testifies to the trade-off remaining unresolved. This is not surprising because, as in statistics, more samples are always better, and few natural thresholds or cut-offs exist for guiding the practitioner (Ammann et al. 1997;Bevilacqua et al. 2012;Mueller et al. 2013).
In the literature, many authors have argued in favour of species-level studies (e.g., Resh & Unzicker 1975;Rosenberg et al. 1986;Lenat & Resh 2001;Schmidt-Kloiber & Nijboer 2004). Others have supported higher-level taxonomic units (genus, family, etc.: e.g., Herman & Heip 1988;Warwick 1988;Olsgard & Somerfield 2000), and some (but not many) have accepted a mix of taxonomic levels and/or taxonomic sufficiency (DePauw & Vanhooren 1983;Ellis 1985;Kingston & Riddle 1989;Ammann et al. 1997;Bailey et al. 2001). Especially when studying less known, typically smaller terrestrial or aquatic invertebrates, the phylogeny and taxonomy of which is often poorly resolved, one is left with identifying to only genus or even family levels, and with the choice of analysing a sample of mixed taxonomic levels vs. binning all taxa into the highest taxonomic units for symmetry reasons.
Typical biodiversity assessments often compare various biogeographic regions or species communities in nature (e.g., Andersen 1995;Jetz et al. 2009;van Rijn et al. 2015), or experimentally manipulate environmental factors (e.g., light, food, pollutants, etc.) that presumably reduce or increase biodiversity (e.g., Ammann et al. 1997; see recent review by Bevilacqua et al. 2012). Such studies usually follow an analysis of variance framework with one or more grouping factors and/or continuous covariates. For example, consider an experimental study producing a reduction in biodiversity by a toxic agent, like a veterinary or human drug ending up in the environment. When comparing the treatment group with a proper control, obviously taxonomic (un)certainty in determining the species affected will be equal for both. Thus, the degree of taxonomic resolution would be of secondary importance as long as it does not mask a potential reduction in biodiversity. Another situation could be a comparison of several sites, with or without any treatment involved, with somewhat different taxonomic resolutions, because they vary naturally in species composition (e.g., southern vs. northern slope habitats in the mountains) or due to varying expertise or resources (time and/or money). In these cases, it is important to know whether and to what extent taxonomic resolution, being, e.g., greater for one data set than another, can bias biodiversity assessments. The combination of these two approaches necessitates understanding how the interplay of taxonomic uncertainty and treatment influences any observed difference in biodiversity.
We here investigate the fictitious but realistic case of dose-dependent mortality effects of a potentially lethal drug on the entire biodiversity of a community of organisms (e.g., Jochmann et al. 2011) by simulation and analytical modelling. We study both above-mentioned effects, alone and in combination, asking (1) whether allowing a mix of taxon levels within a sample, for practical or any other reason, significantly alters or even systematically biases the results, or whether it is a legitimate procedure to be recommended and (2) whether any treatment effects, alone or in combination with taxonomic uncertainty, are duly captured by classic biodiversity indices.
To study the effect of taxonomic resolution in a realistic setting, we based our simulations on a classic macrobenthos data set published and analysed in the context of taxonomic resolution by Wu (1982). Bevilacqua et al. (2012) recently demonstrated the importance in this context of the taxonomic relatedness and degree of species aggregation in the underlying phylogenetic tree, the latter being much more critical than the former. Using Wu's (1982) distribution of species abundances and underlying tree structure (Figure 1(a)), an admittedly arbitrary choice but without loss of generality, we incorporated both these necessary features into our simulations. To augment generality with regard to the underlying taxonomic tree and to verify any pattern found, we additionally analysed a completely symmetrical, perfectly binary taxonomic tree with equally abundant species (Figure 1(b)), which was simple enough to permit analytical mathematical treatment.

Biodiversity indices
We focused on the classic Gini-Simpson and Shannon-Weaver indices. As both indices yielded analogous results in our simulations, we, however, present the latter in the Supplementary material. These biodiversity indices take into account both the number of species (species richness) and their relative abundances (Pielou 1974). Due to known shortcomings discussed, e.g., in Gotelli and Chao (2013), we converted these indices into their corresponding Hill numbers (Hill 1973). Hill numbers can be more intuitively interpreted as denoting how many species, all equally abundant, would have to be in the sample to produce the same diversity value as obtained with the actual sample. Functional diversity indices were not considered because we had no information of functional groupings for Wu's (1982) data set, and phylogenetic diversity indices incorporating phylogenetic information were not used because this might lead to circularities when binning taxa with respect to the particular phylogenies used here (Figure 1; cf. Gotelli & Chao 2013).
The Simpson-index, λ, is given by Equation (1), a mere function of the total number of individuals, N, and the abundances, n i , of all species i (Table 1): The corresponding Gini-Simpson index (H GS ) is then: with 0< λ; H GS f g<1 (Equation (1)) and H GS values towards 1 representing the highest diversity. For presentation of the results, we converted H GS (Equation (2)) to the corresponding Hill number 2 D (Equation (3)), i.e., 1=λ:

Simulation experiment
To understand to what extent taxonomic resolution can affect the diversity measures used to compare various treatments, for our simulated experimental setting, we considered a potential agent that induces mortality. We assumed this treatment affects (1) a proportion of all species (species mortality) as well as (2) a proportion of individuals of these affected species (individual mortality). Thus, the effective mortality for any one species is determined by individual mortality given the species is affected at all. We based our simulations on Wu's (1982) classic macrobenthos data set, which assured a realistic taxonomic tree and distribution of species abundances. For the simulations, we drew the abundance of every species, i.e., counts, in the sample from a Poisson distribution with the species means corresponding to the abundances reported by Wu (1982), assuming realistically low species richness in any particular treatment combination caused by the mortality agent. Wu (1982) only reported species names (and sometimes higher taxa), based on which we constructed a taxonomic tree (Figure 1(a)) using the Tree Of Life (http://tolweb.org).
We studied the biodiversity indices above, as well as simple taxon richness (not reported), by means of simulations. We allowed at most two taxonomic levels higher than species to be considered simultaneously (i.e., genus, family). This is probably sufficient for most macrobiological assessments. Because we randomly assigned species abundances, we calculated a mean and a variability measure when analysing the effect of, for instance, including one taxon at genus level while keeping all other taxa at species level.
To keep our study realistic in terms of sample size, we conducted 25 replicate simulations per treatment combination. As one treatment, we simulated four levels of species mortality resulting from the application of a particular drug, with an unaffected control (i.e., 0%, the baseline), and 20%, 50% and 80% of all species present in the sample being affected. This means that, e.g., at 50% species mortality, half of all the species in the sample were negatively affected by the drug (the others being resistant). To set the effective level of species mortality, we additionally introduced four levels (intensities) of individual mortality within an affected species: 25%, 50%, 75% and 100% (i.e., total mortality). When species mortality was present, the affected species were chosen randomly in each simulated replicate as sampled from a uniform distribution. Individual mortality was then introduced as a binomial process. A given level of individual mortality was always the same for all species. Thus, the simulated experiment follows a factorial design with species and individual mortality as fixed factors, each with four levels, plus taxonomic resolution as an additional fixed factor with four levels (species as the baseline, genus, family, mix). For the mixes, we always calculated the mean, minimal and maximal diversity as described next. We additionally performed various post hoc comparisons and varied sample (i.e., replicate) size, thus evaluating the sensitivity of our effects and conclusions to such variation.
To simulate taxonomic resolution, we proceeded as follows. Based on Wu's (1982) taxonomic tree (Figure 1(a)), we first computed all biodiversity indices for the completely resolved species-level taxonomic tree with randomly assigned species abundances (as described above). This reflected the bestcase, full-resolution scenario. We then manipulated taxonomic resolution by binning (i.e., combining) all species within a given genus, thus simulating absolute species uncertainty, and calculated the corresponding genus-level biodiversity. All genera within a given family were then further binned to derive analogous family-level diversity indices (thus simulating genus uncertainty). Finally and crucially, in addition to this orderly binning, we considered a 'mixed' binning treatment by computing diversity indices for the mixtures of all possible combinations of taxonomic levels of family, genus and species, with some species across the tree being identified to the species, and others to the genus or family.
We started the mixed binning by randomly picking one family in the sample tree ( Figure 1(a)), disregardingfor binning purposesany family that contained only one genus that itself contained only one species because binning in this case would not lead to a different situation. Note that this case can potentially result as a consequence of introducing mortality. Otherwise, given the picked family, we calculated the indices for all possible combinations of the remaining genera and/or species. We then proceeded by picking further families for binning, exploring all possible combinations. Finally, we calculated a mean, the minimum and the maximum of the simulated diversity indices for each mixed experimental combination.
Diversity indices reflecting the four different taxonomic resolutions were always calculated for each of the 25 × 4 × 4 replicate simulation runs. The 25 replicates per treatment combination of the simulated experiment were the entries for our final analysis of variance, for which we plotted the overall mean, with standard errors, of the simulated indices.

Analytical approach
To grasp the effect of taxonomic resolution more generally and to better understand some potentially unintuitive simulation results (see 'Results' section), we also analytically analysed a perfect binary tree as follows (Figure 1(b)). To keep the resulting equations clear, and in contrast to the simulations above, we considered at most one taxonomic level higher than species (i.e., genus or family, but not both) to be mixed with the species level. At the given taxonomic level (genus or family), one or more species may be binned. In contrast to the simulations, analysing a perfect binary tree with equally abundant species at higher levels (Figure 1 Number of taxa up the tree at a chosen taxonomic level u Number of taxa up the tree at a chosen taxonomic level affected by treatment Number of taxa up the tree at a chosen taxonomic level with within-genus competition Proportion of all species (at species level) affected by treatment ('species mortality') δ ¼ x2 uÀd ; xP 0; 1; 2; . . . ; 2 uÀd È É (x is the sum of the involved taxa at level u) α Proportion of all individuals within a species not affected by treatment Proportion of all affected species where both species within a genus are affected by treatment (used for within-genus competition) 0 ψ 1 and ψ ! 2 À 1=δ (b)) does not lead to variation among taxa chosen at that higher level, simplifying the situation tremendously. For the derivations of analytical expressions, we here focus on the Simpson index only (Equation (1)). Parameter descriptions are summarized in Table 1. When referring to an index in the 'Results' and 'Discussion' sections, instead of transforming the indices into the respective Hill number (Equation (3)), we shall write λ À1 , including all sub-/superscripts.

Without treatment
We start the derivations with no treatment involved (subscript nt). The diversity index for this situation, given a perfect binary tree with equally abundant species at all levels, can be derived by simplifying Equation (1). Since we shall compare several results to this basic situation, we refer to it as λ 0 (Equation (4)), without subscript nt: Next, we introduce taxonomic resolution (superscript 1L). This leads to the situation that some species are identified to the species level and the remaining ones to an upper level (u) in the tree for one or more taxa (b): Rearranging and simplifying Equation (5) leads to Equation (6):

With treatment
We start the derivations by calculating the diversity index when all species are determined to the species level (superscript 0L), but are partly affected by treatment (subscript t). A proportion δ of all species is affected by treatment, and the species abundance of an affected species is αn. After treatment, and assuming the community has reached equilibrium conditions, the total number of individuals is N ¼ Sn 1 À δ ð ÞþSnαδ. To simplify notation, we will write N ¼ Ng, where g ¼ 1 þ δ α À 1 ð Þ, also used for Equation (10). Then the Simpson index 0L t λ can be calculated according to Equation (7) as: Rearranging and simplifying Equation (7) leads to Equation (8): Next, we include taxonomic resolution. We assume a strong phylogenetic signal of sensitivity (cf. Puniamoorthy et al. 2014), meaning, for instance, that if one taxon at the family level is used for binning, then either all species in that taxon are affected or not affected by treatment. Note that this underlying assumption limits the potential proportion of the affected species (species mortality) as a function of the binning level: if at least one taxon at family level is chosen for binning, than the minimal proportion of affected species is 2 2 =S: Equation (9) contains, on the right side, from left to right: (1) unaffected species at species level; (2) unaffected species at the higher level; (3) affected species at species level; and (4) affected species at a higher level: Rearranging and simplifying Equation (9) leads to Equation (10): If u ¼ 0, then 1L t λ reduces to 0L t λ, and if δ ¼ 0, then 1L t λ reduces to 1L nt λ. Finally, note that all indices presented, i.e., the rearranged and algebraically simplified equations, do not depend on the species' abundances.
We additionally explored the possibility of interspecific competition interacting with taxonomic resolution and treatment effects using a Lotka-Volterra competition framework. Since this is merely one of the many possible ways of theoretically studying competing species, these methods and results are presented in the Supplementary material.

Comparison of the analytical and simulation results
To directly compare the two different approaches, we transformed the experimental simulation results into an interpolated contour plot with species mortality and individual survival (= 1individual mortality) as axes, because the simulations were based on only a limited number of points along each axis. Wu's (1982) data set

Simulation results based on
We report only simulation results for Hill number (Figures 2, S1; Table S1); analogous results for 1 D ¼ exp H SW ð Þ are reported in the Supplementary material ( Fig. S2; Table S2). Diversity, 2 D, decreased markedly from the baseline, full-knowledge species-level analysis (Figure 2(a)) when all species were binned (i.e., lumped and thus ignored) within their respective genera (genus-level analysis: Figure 2(b)), and even more so when they were binned at the next higher family level (Figure 2 (c); main effect of taxonomic level in corresponding ANOVA, Table S1: P < 0.001). The average reduction for the mix (binning at various levels) was actually the smallest (Figure 2(d)), as it essentially reflects a weighted average of species-, genus-and family-level taxa for the mixes simulated. Increases in both species mortality (x-axis) and individual, within-species mortality (from left to right for each group of dots) due to treatment also reduced the biodiversity at all taxonomic resolution levels (highly significant main effects in Table S1 for species, genus, family and mix). The pentagon to the left of each panel in Figure 2 denotes the (greatest) baseline diversity without any treatment-induced mortality. These overall diversity reductions due to binning (i.e., the taxon-level main effect) necessarily occur but validate our methodology, and the main effects of both mortality treatments are also intuitive (Tables S1 & S2).
Crucially, the significant two-and three-way interactions in Tables S1 and S2 reflect the nonlinear biodiversity loss with increasing species and individual mortality as well as binning intensity (Figures 2, S2). This results in the reduction in biodiversity relative to the baseline, full-knowledge species-level situation actually being lowest at the highest mortality levels simulating strong lethal effects of a drug on many species (to the right in each panel and group of dots), and highest at the low mortality levels simulating benign lethal effects (Fig. S1). Several pairwise post hoc comparisons of treatment combinations do not differ significantly, such that the corresponding biodiversity differences would not be detected especially at lower sample sizes. For example, at low (25%) or intermediate (50%; squares and triangles in Figure 2) individual mortality, biodiversity does not differ across all species' mortality treatments at any taxonomic resolution. Further, when holding species mortality constant, the differences among the four individual mortality treatments always remain significant at the species level, whereas differences become increasingly non-significant as species are binned into ever higher taxa, especially at low sample sizes approaching 10 or less (not shown). Therefore, lower taxonomic resolution leads to decreased power in detecting especially benign mortality effects (Figure 2(b-d, f)), if only because binning leads to naturally lower means with lower variance resulting in less pronounced treatment differences. Nevertheless, while all three binning scenarios resulted in similar overall patterns of diversity reductions, the mix on average (Figure 2(d)) performed best overall, showing least reductions relative to the specieslevel analysis in Figure 2(a) (which signify information loss or 'error') as well as the smallest error ranges (i.e., spread of points in y-direction (Fig. S1).

Analytical results using the perfect binary tree
We discuss theoretically derived effects of treatment and taxonomic resolution on the biodiversity index (in the following simply index), separately or in combination. Thus, in all subsequent figures, we show the reduction in the diversity index resulting from specific effects relative to the absolute baseline level λ 0 . We present intuitive plots instead of analytically discussing all derived equations. Note that for algebraic reasons the equations feature 'individual survival' (α) instead of 'individual mortality' (1 À α), as in the simulated experiments. Figure 3(a) shows the effect of treatment alone, i.e., 0L t λ À1 À λ 0 À1 (Equations (8) and (4)) as a contour plot with species mortality δ ð Þ and individual survival (α) as axes. In general, the reduction in diversity increases monotonically (from top to bottom in Figure 3) as individual survival decreases (i.e., individual mortality increases). At rather high individual survival rates (towards the top of Figure 3(a)), varying species mortality (in x-direction) practically does not change estimated biodiversity reduction much, resulting in non-significant paired comparisons in the simulation (Figures 2(a), S1). For example, when comparing species mortalities 20% and 80% at fixed individual survival of 75%, little to no biodiversity reduction resulted relative to the baseline level of no species mortality, even though approximately 25% of all individuals have disappeared due to treatment (overlaid labelled contour lines in Figure 3(a)). When reducing individual survival further to, e.g., 40%, in contrast, we clearly see reductions at both 20% and 80% species mortality, which, however, do not differ profoundly, even though approximately three times more individuals have disappeared in total at 80% compared to 20% species mortality (Figure 3(a)). Figure 3(c) shows the corresponding interpolated contour plot of the simulation results. Since Wu's (1982) sample tree and the perfect binary tree differ in branching regularity, number of nodes and species (Figure 1), the reduction in diversity is qualitatively similar in both situations but overall considerably lower in Figure 3(c) (cf. Figures 2(a), S1; corresponding to Wu's data set) than in Figure 3 (a) (corresponding to the symmetrical three; see colour bar). Figure 4 depicts the sole effect of taxonomic resolution, i.e., 1L nt λ À1 À λ 0 À1 (Equations (6) and (4)).

Taxonomic resolution
Diversity is overall more strongly reduced as more taxa are binned into higher taxonomic levels (genus vs. family), and the reduction is always more pronounced at the family than the genus level (as in Figures 2, S1), which is intuitive and expected. Note that here the greatest possible reduction has value 64, i.e., λ 0 À1 . Figure 5 shows the reduction in diversity when additionally including treatment. The corresponding equation, Equation (10), allows including taxa at one higher taxonomic level that (1) contain unaffected species (parameter b) or (2) contain affected species (b δ ), alone or in combination. For illustrative purposes, we chose particular combinations leading to the three contour plots in each column of Figure 5: in the left column, higher taxa are at the genus level and in the right column at family level. To permit direct comparison between taxonomic levels, for the family level, we always used half of the values used at genus level, as in our symmetrical tree (Figure 1(b)) two genera bin into one family. In Figure 5, we investigated taxonomic uncertainty by varying the proportion of (un)affected taxa given the fixed number of higher taxa (from top to bottom). We investigated various total numbers of taxa at the higher level, leading to qualitatively similar results as the ones shown in Figure 5 (not shown). Figure 5 shows two main responses of our biodiversity measure resulting from the combination of taxonomic resolution and mortality treatment effects. (1) The thick black contour lines in Figure 5 delineate the sole effect of taxonomic resolution with eight taxa at the genus level or four at the family level ( 1L nt λ À1 , Equation (6)); these reductions can also be seen in Figure 4 (for x = 8 or 4). From top to bottom in Figure 5, a decrease in unaffected (b) and a corresponding increase in affected taxa (b δ ) at the higher taxonomic level lead to less reduction in biodiversity for an increasing set of combinations of δ and α (towards the upper left). This effect is even more pronounced at the family level compared to the genus level (compare columns in Figure 5). That is, the combination of taxonomic resolution and treatment effects can lead to a higher biodiversity index than taxonomic resolution alone. (2) Augmenting the number of taxa at the higher taxonomic level that includes affected species (a) introduces strong nonlinearities and (b) overall diminishes the reduction in diversity (compare contour plot colour bars with the same scale).
Finally, Figure 6 displays interpolated contour plots corresponding to the simulated experimental results assessing the effects of taxonomic resolution (i.e., the 'mixed' situation in Figure 2(d-e)). In contrast to the analytical approach, to gauge the effect of taxonomic resolution, we calculated mean as well as minimal and maximal values of all possible combinations (see 'Methods'). When constructing contour plots analogous to those in the left column of Figure 5, minimal index values (Figure 6(b)) produce the strongest diversity reductions and resemble the subplots in the first row of Figure 5, while maximal index values (Figure 6(c)) produce the weakest reductions and resemble the more complex subplots in the third row in Figure 5. In Figure 6(c), the nonlinear effects produce non-significant biodiversity reductions along both axes, e.g., not only when fixing individual survival at 75% and comparing species mortality of 20% and 80%, but also when fixing species mortality at 25% and comparing individual survival of 25% and 75% (cf. Figure 2).

Discussion
Our study highlights important aspects and problems when assessing biodiversity by way of standard indices in the presence of uncertain taxonomic resolution and treatment effects, alone or in combination. The following discussion is structured accordingly.

Taxonomic resolution
A key positive result is that our study justifies assessing biodiversity of any particular sample of organisms at various, i.e., mixed, taxonomic levels  To permit direct comparison between taxonomic levels, we used eight taxa at the genus and four taxa at the family level, as in our symmetrical tree two genera bin into one family. Parameter combinations for the left column (genus level): (a) all higher taxa unaffected: b = 8, b δ = 0; (b) four taxa affected (b¼ 4) and four unaffected (b δ ¼ 4); (c) all taxa affected: b = 0, b δ = 8. Solid black lines: contour levels when only considering taxonomic resolution effects. If b δ is greater than 0, then the proportion of species affected, δ>0; is a function of these taxa and their taxonomic level (Table 1); therefore, the plotted values in Figure 5 (b, c, e, f) start at δ>0 (x-axis). ( DePauw & Vanhooren 1983;Ellis 1985;Kingston & Riddle 1989;Ammann et al. 1997;Bailey et al. 2001). It is therefore not necessary to define a priori a particular depth of the identification effortone can identify as far as the experience and means allow and use all data. Our results also support Mueller et al.'s (2013) suggestion to apply groupspecific taxonomic levels, because the effectiveness of binning varies substantially among taxonomic and functional groups (see also Bevilacqua et al. 2012). Of course, binning species into higher taxonomic categories such as genera or families as surrogates for species diversity will inevitably reduce any biodiversity index because the number of taxa in the sample will necessarily be lower, a trivial result also confirmed by our study. However, this will occur to a roughly similar extent for all samples and treatments such that meaningful comparisons are still possible; and indeed, numerous studies in the past have documented a positive correlation of higher taxon diversity surrogates with underlying species-level data (e.g., Wu 1982;Herman & Heip 1988;Andersen 1995;Balmford et al. 1996aBalmford et al. , 1996bBaldi 2003;Bevilacqua et al. 2012;Mueller et al. 2013). Our simulations further showed that in such cases the reduction in the biodiversity index relative to the full-knowledge species level situation, i.e., the information loss or estimation 'error', was actually smallest for the mixed taxonomic data set, which intuitively reflects a weighted average index of species-, genusand family-level data in proportion to the actual binning performed at the various levels. That is, the mixed sample actually performed better than, and therefore should be preferred over, the 'pure' genusor family-level assessments (i.e., orderly binning), which lose more information overall. This is good news, because it gives the practitioner full flexibility in deciding the identification level even post hoc. Given our combined approach investigating simple symmetrical as well as naturally complex taxonomic situations, this conclusion is general and independent of the underlying taxonomic tree structure (Figure 1).

Treatment effects
Even though higher taxon-level diversity data (including mixes) can reasonably and effectively be used as surrogates for always more precise species-level data, our theoretical study also showed that the error observed under different binning scenarios will depend on the degree of species loss induced by any treatment or revealed by any site comparison, thus potentially introducing systematic biases mandating some caution, and confirming Wu's (1982) similar conclusion based on empirical data. Treatment alone influenced the observed magnitude of the diversity reduction in the analytical derivations (Figure 3(a)) as well as the simulation experiment (Figures 2, 3(c), S1) by inducing differential mortality. Increasing individual mortality (in y-direction in Figure 3; different symbols from left to right in Figures 2, S1, S2) induced scaled and ever stronger diversity reductions, whereas the corresponding changes for species mortality (in x-direction) were weak, especially at low individual mortalities of 25% and 50%, slightly curvilinear and generally did not differ significantly even at high sample sizes. For instance, 25% and 80% species mortality resulted in similar diversity reductions when only a low 25% of individuals of the affected species die (squares in Figures 2, S1), the reduction actually being greatest at intermediate, 50% species mortality (also seen in Figures 3(a, c)). Thus, in practice, particular treatment combinations can lead to indistinguishable diversity indices in a not necessarily intuitive manner, depending on their precise choice. Although we treated species and individual mortality as distinct (crossed) effects, they are in fact hierarchically related because we defined species mortality as the proportion of all species in the sample potentially affected by treatment, with de facto mortality for any particular species specified by individual mortality. Regardless, probably treatment agents can be found that correspond to the majority of combinations covered here, some admittedly being a bit unrealistic in practice. In general, the effect strength of varying individual mortality exceeded that of species mortality in our simulations based on particular phylogenetic relationships (Figures 1(a), 2). It would therefore be instructive to adjust the parameter domain utilized in our study according to a survey of agents affecting biodiversity in practical applications or actually expected biodiversity differences among natural habitats (cf. Jochmann et al. 2011;Bevilacqua et al. 2012). Despite having treated only two standard (univariate) diversity indices (Hill numbers) here while disregarding others (as justified in the 'Methods'), and not considering more complex multivariate community descriptors (which have been reported to be more robust in this context: Mueller et al. 2013), we see no reason why our results should not qualitatively translate into most if not all diversity measures (Gotelli & Chao 2013). After all, our analytical and simulation parts, based on very different taxonomic trees (Figure 1), yielded similar results even in the relative range (Figure 6), thus supporting each other in reaching rather general conclusions.

Combined taxonomic resolution and treatment effects
Combining taxonomic uncertainty and treatment introduced additional non-intuitive effects reducing diversity (Figures 5, S3). Our simulations based on Wu's (1982) sample representing a realistic taxonomic tree structure (Figure 1(a)), as well as our analytical results for the perfectly symmetrical tree (Figure 1(b)), clearly showed that the reduction in diversity under different binning scenarios (genus, family, mixed levels) depends nonlinearly on the mortality schedules induced by any treatment or factorial comparison (Figures 2, S1, S3, 5, 6). The biodiversity index reasonably generally decreases as individual within-species mortality increases, and also decreases on average as species mortality increases (Figures 2, 3). The somewhat stronger nonlinear effect obtained in the analytical treatment (Figure 3(a vs. c)) presumably relates to the more symmetrical tree (Figure 1), in concordance with Bevilacqua et al. (2012), who showed using simulations that the validity of using higher taxa as surrogates depends on the tree structure, less in terms of phylogenetic relatedness (i.e., the node structure) but more in terms of species clumping per node (cf. Figure 1(a)). Our study supports these likely not easily tractable and ultimately uncontrollable effects of the underlying taxonomic tree structure when using higher taxa as biodiversity surrogates.
Two additional results regarding the binning of species into higher taxa comprising species affected by treatment deserve scrutiny. First, there are treatment parameter value combinations for which the biodiversity index actually counterintuitively increases, i.e., the biodiversity reduction is attenuated, relative to the situation with only taxonomic uncertainty (upper-left regions with low species and individual mortality in Figure 5, demarcated by the solid black contour line). This occurs because binned species effectively represent a new taxon, so that in most situations binning accentuates an uneven distribution of taxa in a sample. Treatment, on the other hand, particularly when assuming a strong phylogenetic signal in species' sensitivities (Bevilacqua et al. 2012;Puniamoorthy et al. 2014), can at least partially reestablish evenness by decreasing the abundance of especially affected (and binned) species. Second, by combining taxonomic resolution and treatment effects, and thus augmenting the number of taxa containing affected species, nonlinear interactions can become further accentuated; additional opportunities for statistically indistinguishable biodiversity indices can therefore arise, e.g., when fixing species mortality (δ) and varying individual survival (α) (from top to bottom in Figures 5(b, c, e, f)). We are aware of the simplicity of the studied analytical model, which was chosen for the sake of clarity and treatability. Nevertheless, a pattern very similar to this double opportunity for indistinguishable biodiversity indices also resulted in our more realistic simulations (Figures 2, S1, 6). It remains to be seen how important these effects are in practice since mean, minimal, and maximal index values cover a range of distinct effect patterns ( Figure 6). Importantly, our post hoc comparisons and subsampling (sensitivity) analyses indicate that the scaled, nonlinear interaction of all three factors (Tables S1  & S2) frequently resulted in non-significant differences between particular treatment combinations of interest when in fact differences are present, especially at the low sample sizes typical for these kinds of biodiversity assessments and when binning occurs. In this context, we note that for computational ease we limited our simulations to using only the species' mean abundances reported in Wu (1982); how variation in means and variances globally affects our conclusions remains to be explored in detail.
In closing, we want to briefly draw attention to dynamical aspects. We studied treatment effects by analysing how reduced abundances of affected species influence biodiversity indices. However, treatment effects will likely often interact with the dynamics of the sampled populations. Depending on the community, this can include all sorts of direct and indirect interspecific interactions, e.g., competition, predation, mutualism. To start grasping these consequences, we additionally analysed interspecific competition under admittedly rather simplified assumptions. Methods and results can be found in the online supporting material. The main conclusion is that adding interspecific competition to taxonomic resolution and treatment effects in general leads to greater biodiversity reduction, but at the same time prevents strong nonlinear effects (Figs. A2 & A3, Supplementary material). Further, a negative effect of a treatment on one species may remain undetected in practice if species are binned because of compensatory competitive replacement by the closely related unaffected species. Understanding biodiversity dynamics is a very active and much needed research field (e.g., Dornelas et al. 2012), and additionally studying the effect of exogenous perturbations like toxic agents will help researchers and conservation managers to better predict expected biodiversity changes.

Conclusions
Taxonomic surrogacy, uncertainty or resolution, be it inadvertent or deliberate, remains an important issue in biodiversity research that cannot be ignored, as it strongly affects diversity estimates regardless of which index is used (Gotelli & Chao 2013). We here showed that haphazard binning of species into various higher levels is legitimate and even preferable to binning all evaluated taxa at the higher genus or family level, despite systematic estimation biases depending on treatment-induced mortality being likely to occur under any surrogate scenario. We further showed that the combination of taxonomic binning and treatment effects introduces at times complex and non-intuitive nonlinearities that can obscure the statistical detection of biodiversity differences. We believe that our approach, combining simple symmetrical as well as naturally complex taxonomic situations ( Figure 1) and using analytical as well as simulation modelling, produced rather general conclusions. We recommend being specific about the expected qualitative and quantitative effects of any treatment or natural comparison before formulating a hypothesis to be tested regarding biodiversity reduction. This is especially true when focusing on simple metrics such as species richness or the Gini-Simpson biodiversity index rather than more complex multivariate measures (cf. Mueller et al. 2013). The results of our theoretical study should aid in this endeavour.