Risk assessments of contaminated sediments from the perspective of weight of evidence strategies – a Swedish case study

Abstract Several countries currently lack common recommendations specific to Ecological Risk Assessment (ERA) of contaminated sediments and stakeholders report inconsistencies between currently used approaches. The objective of this study was to provide an increased understanding of how ERAs of contaminated sediments are conducted in comparison to established guidelines. For this, we use Sweden as a case study and compare seven ERAs with four internationally established strategies. Our results indicate that contaminant concentrations receive a comparatively high weight, despite a lack of appropriate benchmarks; toxicity measurements are uncommon, while routine in established strategies; and the integration and interpretation of results lack transparency. We identify three areas that may help improve the practice of ERAs: a common approach to benchmarks, recommendations for how to assess toxic effects, and a common approach for integrating and interpreting results.


Introduction
Healthy sediments are a prerequisite for important ecosystem services, such as food production, filtering and storing of excess nutrients as well as recreational values (Troell et al. 2005;USEPA 2005;Schmidt et al. 2011;Severin et al. 2018). Even so, they are often overlooked in environmental management. Waterbodies have historically received little environmental consideration; for long, there was no regulation for the release of nutrients and refuse water from industry. Contaminants and their effects in the environment gained global public and regulatory awareness after the 1960s, with for example the publication of Silent Spring (Carson 1962). In the Baltic Sea, chlorinated organic pollutants were detected for the first time in the later 1960s (Jensen et al. 1969;Elmgren 2001). Since then, environmental management of the Baltic Sea has to some extent been a success story and emissions of many contaminants from the surrounding catchment area have been greatly reduced, even though new threats are emerging (HELCOM 2004;Wiberg et al. 2013). However, due to the historical input, high levels of contaminants are still stored in sediments in and around the Baltic Sea and, at least at local scale, there might be harmful ecological exposure and effects (Jonsson et al. 2000;Sobek et al. 2014;Mustaj€ arvi et al. 2019).
Improving the management of contaminated sediments is of growing concern in Europe. Sediment has not been included sufficiently in the EU Water Framework Directive (WFD) and there are now plans to incorporate guidance for management of contaminated sediment in the Common Implementation Strategy (CIS) for the WFD. SedNet (The European Sediment Network) is currently drafting chapters on such guidance for the coming WFD CIS that will be finalized in 2021 (Brils 2020).
Ecological Risk Assessment (ERA) of contaminated sediment is the process of determining the risks to the environment from historical contamination. A well-conducted ERA is a systematic process that structures scientific information to support environmental decision-making. However, in its early days, the scientific basis of ERA was not well defined and, compared to human health risk assessments, ERA was not a standardized part of decision-making. In the US, the role of ERA changed when the USEPA developed a standardized ERA strategy in the 1990s. The development of a standardized framework provided consistency and a common ground improving the communication between stakeholders from different fields. Key to giving ERA more weight in decisionmaking was increasing the scientific quality. This was achieved in part by highlighting the need to characterize and integrate both the exposure to contaminants as well as the ecological effects to describe ecologically relevant risks (Barnthouse 2008). The USEPA framework for ecological risk assessment included potentially affected receptors (i.e., assessment endpoints) and metrics used to analyze the exposure or effect of contaminants to the receptors (i.e., measurement endpoints) (USEPA 1998) (Figure 1).
In order to assess contaminated sediments, several lines of evidence (LOE) have been used. The most common LOE has been to compare contaminant concentrations with general numerical guideline values, or sediment quality guidelines (SQG). However, contaminated sediments are complex to assess and the difficulties in generalizing the potential exposure limits the capability to predict ecological effects with the use of SQG. For example, the interactions between the sediment and the overlying water affect the contaminants' mobility, and the effects of the contaminant might be affected by an unknown mixture of additional contaminants and conditions (Apitz 2005;Bridges et al. 2006). Therefore, it is suggested that SQG can be used to indicate very low or very high risks of toxicity to organisms but not to state whether toxicity is possible or impossible (Wenning et al. 2002;O'Connor 2004). The complexity of assessing contaminated sediments might warrant several LOEs to assess the ecological implications of the contaminants of concern. Most LOEs that are used address either sediment toxicity, the local aquatic community or the contaminant concentrations in the sediment (Chapman and Hollert 2006).
The synthesized results from individual LOEs characterize risk, or the degree of risk and effects, and are often referred to as weight of evidence. The simplest versions of weight of evidence list the results from the individual LOEs or reach a conclusion based on the judgment of the assessor, sometimes referred to as "best professional judgment".
Additional weight of evidence approaches attempt to make the synthesis of risk transparent and reduce the level of subjectivity. Such approaches use the LOEs to evaluate cause and effect-relationships or integrate them to determine the degree of risk or impairment (Linkov et al. 2009).
There are several weight of evidence strategies suggested for contaminated sites, in general (e.g., Contaminated Sites Working Group 2020), and also specifically for contaminated sediments. One example is the TRIAD, which in the end of the 1980s was early in combining measurements of the prevalence of contaminants, bioassays to measure toxicity and measurements of in situ effects. It also provided a simple decision support matrix for determining if the combination of LOEs indicated risk (Chapman 1990). Since then, several strategies of various complexity have been described in the scientific literature. To be more time and cost effective as well as to fit a larger scope of cases some strategies provide tiered frameworks that start with screening for potential risks and add case specificity (e.g., additional LOEs) as the need for more detailed information increases (Scrimshaw et al. 2007;Chapman et al. 2010;Simpson and Batley 2016).
The implementation of approaches to contaminated sediment differs between countries and in Europe there has been a focus on general chemical thresholds compared to e.g., in North America where there has been a focus on site-specific conditions (Apitz 2008). Some European countries have developed official strategies (in the form of advisory or mandatory frameworks, guidelines and standards) for assessing the risk and status of contaminated sediment (den Besten et al. 2003;Breedveld et al. 2015). In Sweden there is no such official strategy even though the Swedish government has an explicit intention to take the lead in sustainable development and management of the environment (Swedish Government Offices 2018). There are currently no common recommendations for sediment-specific measurements and assessment methods, nor for assessment criteria, such as SQGs intended for risk assessment (Severin et al. 2018). Swedish stakeholders have reported inconsistencies in approaches and knowledge between stakeholder groups, preventing effective communication (L€ andell et al. 2014).
In the absence of national guidelines, Swedish assessors have reportedly used guidelines and assessment criteria from Norway, Canada, the Netherlands and the US when assessing risks from contaminated sediments (L€ andell et al. 2014). This indicates that there might be large variations and inconsistencies among ERAs and practitioners and multiple Swedish authorities have stressed the need for a national ERA strategy with a guideline for sites with contaminated sediments (L€ andell et al. 2014;Severin et al. 2018).
The objective of this study was to provide an increased understanding of how ERAs of contaminated sediments are conducted in comparison to established strategies. Using Sweden as a case study, we specifically addressed what evidence of environmental risk and impairment has been collected and how the evidence has been used in order to characterize risk. This is especially timely in a Swedish context where the government acknowledges that multiple societal challenges and environmental goals are dependent on healthy sediments and has recently launched a commission with several authorities to improve the knowledge on management of contaminated sediments (L€ ovin and Sj€ ogren 2019). The results can also serve as an indicator of the situation in the countries surrounding the Baltic Sea and other members of the European Union that also lack strategies for ERA at sites with contaminated sediments.

Method
We conducted this study in two main steps. A) We first established an analytical framework to use as a reference point by reviewing strategy documents from the Netherlands, Norway, US and Canada describing how to perform ERA at sites with contaminated sediments. B) We then investigated which LOEs were assessed and how they were used in order to determine risk by analyzing cases of ERA from Sweden.
Our analysis method for the strategies from Canada, Norway and the US was based on the recommendations for how to perform content analysis by Neuendorf (2002), Krippendorff (2004) and Bryman (2008). Content analysis is a method commonly used to scrutinize large sets of text-based information in order to identify and divide it into themes. It is used both to identify previously undefined themes within a dataset and to connect content with an analytical framework of already defined themes (Julien 2008). In this case, we used content analysis to search for themes in the Swedish ERAs that we first identified in the strategies from the four countries. We conducted the content analysis manually by reading and categorizing the documents' content in the analysis with the software NVIVO 12 (QSR 2018).

Establishing an analytical framework
We included documents describing strategies for ERA at sites with contaminated sediment from Canada, the Netherlands, Norway and the US. We chose these as Swedish assessors reportedly turned to strategies from these countries for advice on how to assess sediments (L€ andell et al. 2014). They were also endorsed by the Environmental Protection Agencies (EPAs) in their respective countries. It should be noted that the Dutch strategy for ERA of contaminated sediments is no longer in use. The strategy is included as it has been one of the standard references and an inspiration for Swedish assessors. However, it is no longer part of the Dutch guidelines and is replaced by a framework not specifically focusing on sediment.
We performed the analysis of the strategies by closely reading the material on how to conduct ERA of contaminated sediments and assigning all content to themes. We then compared and categorized the themed content among the guidelines, such as content describing the use of sediment toxicity as a LOE. We labeled the themes, such as the separate LOEs, with the term used by the strategy that used the most inclusive description of the theme. We searched the strategy documents specifically for content stating: 1) which environmental variables or LOEs they suggested assessing and 2) how and in what situation they suggested measuring different variables. Specifically, we searched for which factors would trigger the assessment of a specific LOE. For the Dutch strategy, the information was already translated and readily available in den Besten et al. (2003).

Analysis of ERA cases
After we established the analytical framework, we investigated to what extent ERAs of contaminated sediment sites in Sweden corresponded to the framework. To find the ERAs, we sent a request to regulatory agencies supervising contaminated site management, and contacted companies that conduct ERA. We asked for the documents produced for reporting ERA to regulatory agencies and other stakeholders by the risk assessors.
We included documents that: Described the background of the ERA Problem formulation, description of the area, history and contaminants of potential concern. Described the sampling and measurements of environmental variables (LOE) Characterized risk Described and compared results and assessment criteria (e.g. benchmarks or reference sites). Determined risk and recommended a course of action or no action.
We included seven ERAs in the analysis. They were conducted by private assessors, two of them on behalf of a private company and the remaining for municipalities. Several of the ERAs also covered assessments of management options. See Table 1 for an overview of the ERAs.
We compared the content in the ERAs describing their methods, measurements, and which environmental variables they had assessed to the same variables in the frame of reference. We assigned the measurements described in the ERAs to corresponding LOEs described by the strategies.
We also took note of content describing the following: Which assessment criteria the ERAs compared to their measurements in order to characterize risk. Whether or not the characterization of the results from each LOE indicated risk. How the results from the individual LOEs were synthesized into an integrated measure of risk. Whether or not the ERAs concluded if the results were indicative of risk. Whether or not the ERAs recommended management to mitigate the risk. If the ERAs assessed management alternatives, and if so, if they provided preliminary cost predictions.

Frame of reference
The strategy documents aim to provide guidelines for how to assess the environmental risks (in situ), or the spread of contaminants, from sites suspected or known to be contaminated, in order to provide information for decision makers on the potential need for management. This is in contrast to ex situ ERA, the practice of evaluating environmental risks of sediments scheduled for management such as dumping (Algar et al. 2014). The strategy from the US covers both ex situ and in situ ERA. The documents all suggest the use of measurements from a combination of LOEs in order to characterize risk. However, they differ in flexibility and in terms of what LOEs, Harbor sediments were known to be substantially contaminated from historical industrial activities and boat traffic. The contaminants included Ar, Cd, Cu, Hg, Ni, Pb, Zn, PCBs, dioxins and TBT. The overall purpose was to limit environmental effects and remove the outflow of contaminants from the harbor; the aim of the ERA was to establish the extent of the contamination in order to develop remediation alternatives. B Lake sediments were known to be contaminated with dioxin from historical industrial activities. The aim of the ERA was to ensure that contaminants did not pose a considerable environmental risk. C Lake sediments were suspected to be contaminated by a onetime oil leakage from a neighboring industrial site. The aim of the ERA was to determine what contaminant pressure the area would be able to withstand, in order to determine the need for potential mitigating management. D Same site as for ERA C. The lake sediments were now known to be contaminated with PAHs. The purpose was to assess the risk of effects and potential spread of PAHs from the sediment. E The sediments in a river bay, close to an estuary, were known to be contaminated with mercury from former pulp industry. The aim of the ERA was to assess the need of mitigation of environmental risks in situ and the spread to surrounding water bodies in order to device a management plan. F Sediments in a stream were suspected to be contaminated, mainly with pyrite ash. The purpose of the ERA was to assess the risks to human health and the environment and provide the basis for a management plan. The aim was to achieve levels of metals in the water column at background levels and the colonization of a gastropod community in the area. G Lake sediments in a bay were known to be contaminated with PAHs from the use of creosote in neighboring industrial and harbor activities. The aim of the ERA was to conduct an in-depth assessment, focusing on biological risks. The aim was to provide information to determine the need and scope of mitigating management.
methods and measurements they prescribe. The US document is the most extensive and offers a comparatively flexible guide on the benefits and limitations of a number of suggested LOEs, in order for the assessors to choose what methods would be most suitable for their individual sites. The other three documents are comparatively less flexible and provide specific recommendations for when and what LOEs to assess; the Norwegian and Dutch documents also recommend specific methods for several of the LOEs they address ( Figure 2): Canada-Ontario decision-making framework for assessment of great lakes contaminated sediment by Environment Canada and the Ontario Ministry of the Environment (Anderson et al. 2008). This document is a regional framework mainly intended for the Great Lakes' sediment but can be used for other areas. It describes a tiered approach and prescribes specific LOEs for each tier but with some flexibility in what measurements to conduct within the LOEs. It begins with a screening for potential contaminants of concern. If the contaminants are biomagnifiable or exceed conservative SQG, a second tier follows with toxicity tests, modeling of the risk of biomagnification and a survey of the benthic community structure. An additional case specific LOE can replace the benthic survey if the conditions are unsuitable, such as if there are interferences by other stressors than the contaminants, such as heavy boat traffic. The example given in the document of an additional LOE is to use biomarkers of toxic effects in situ. A fixed decision matrix is used in order to facilitate integration and interpretation of the results (Table 2). If additional information is necessary the assessment proceeds to a third tier with a case-specific design. Guidance document for site-specific effect-based sediment quality assessment (title translated from Dutch) by Institute for Inland Water Management and Waste Water Treatment and the Ministry of Infrastructure and the Environment ( Van Elswijk et al. 2001;den Besten et al. 2003). The strategy document focuses on freshwater systems and aims to determine the need for remediation. Using a tiered approach, it starts with a screening for contaminants. If the concentrations of contaminants exceeds national maximum criteria for ecological effects and risk to human health, the assessment should proceed. The second tier assesses risk to human health, contaminant transport, bioaccumulation, toxicity and effects on the benthic community. The strategy provides detailed instructions on which measurements to take and what criteria to use to determine risk. If the results from any of the LOEs in the second tier exceed maximum criteria, it indicates that management is urgent. The results from the individual ecological indicators are integrated and interpreted with a decision matrix that can be used to prioritize the urgency of remediation between sites (Table 2).
Risk assessment of contaminated sediments -Guideline (title translated from Norwegian) by the Norwegian Environment Agency (Breedveld et al. 2015). The Norwegian document is a national guideline that addresses coastal sediments and aims at determining the need for remediation. It follows a tiered framework prescribing detailed measurements for each LOEs in each tier. The initial screening assesses potential contaminants of concern and toxicity of porewater extracts and several species. A second tier follows if conservative national SQG are exceeded and/or the toxic effects on growth and mortality exceed set levels for three species (Skeletonema costatum, Tisbe battagliai, Crassostrea gigas). The LOEs in the second tier address human health risks, whole sediment toxicity tests on mortality and behavior on one out of two species, and modeling of contaminant transport. The risk assessed based on the toxicity tests and human health assessments is determined with standardized criteria for effect and exposure. The risk from contaminant fluxes is determined based on case specific criteria. If any of the criteria are exceeded it indicates that management should be evaluated. The guideline presents general recommendations for how to design a case-specific third tier if additional information is needed, either to conduct a more informed management evaluation or to further establish risk.
Processes, Assessment and Remediation of Contaminated Sediments by the US Departments of Defense and Energy and U.S Environmental Protection Agency (Algar et al. 2014). The US document aims at giving a flexible account for how ERA can be conducted referring to the general USEPA ERA framework ( Figure 1) and describing potential approaches and LOEs rather than dictating ERA in detail. The document prescribes five LOEs as general and useful for most ERAs of contaminated sediments; several other LOEs are also described but as more case specific depending on the scope and aim of the ERA. The document describes that two or more LOEs are needed in order to characterize a known risk and that the weight of biological measurements, such as benthic surveys, are considered higher in comparison to models and measurements of sediment contaminant concentrations. However, the document does not prescribe a specific combination or order in which to assess LOEs. The document proposes that the stakeholders should agree on how to integrate risk and what criteria to use in advance of the assessment rather than providing specific criteria or methods for interpreting risk. The framework is endorsed by the USEPA and available for all, but some American states have adopted their own frameworks.

Analysis of ERA
Our comparison of the performed ERAs to the framework shows that the LOEs most commonly addressed by the ERAs were contaminant concentrations (6/7 ERAs) and contaminant transport (5/7 ERAs) ( Table 3). We did not find any content in the ERAs corresponding to the LOEs of in situ toxicity testing, Toxic Identification Evaluation or Critical Body Residues.

Contaminant concentrations
Apart from ERA D, all the ERAs measured the concentrations of contaminants in the sediment and compared them to various criteria in order to assess risk (Table 5). The ERAs used five different sets of criteria and ERAs B and G used SQG indicative of ecologically adverse effects, as suggested in the strategy documents (Table 3). The other criteria used were sediment background levels and Swedish quality guidelines for soil ( Table 5). Several of the ERAs discussed the lack of Swedish SQG in relation to their use of criteria that were not specifically derived for assessing environmental risks from contaminated sediments.
Sediment toxicity ERA G conducted a Microtox assay, which measures the effect on bacteria exposed to sediment or a sediment extract. No other toxicity tests were conducted by any of the ERAs. The US strategy suggests using tests such as Microtox tests for initial screening of potential risks and the Dutch strategy includes Microtox as one out of several prescribed tests. The strategies from the Netherlands, Norway and Canada suggest the use of multiple species and assessment endpoints to assess toxic effects from the sediment ( Table 3). The documents from Canada and the US highlight that the choice of assessment endpoints and organisms should represent the site and be based on the protection endpoints.
to consumption criteria to assess risks to human health and/or to reference areas ( Table  5). None of the ERAs modeled or measured the trophic movements of contaminants. In comparison, all four strategies suggested modeling bioaccumulation or biomagnification and, if needed, to supplement the models with field measurements (Table 3).

Benthic community
ERAs E and G assessed the macrobenthic community structure and F assessed the abundance of a benthic gastropod, Radix balthica. The ERAs compared the results to reference areas to determine potential effects (Table 5). Depending on the situation, the strategy documents all suggest surveying the benthic community structure (Table 3). The US and Canadian documents suggest that sound surveys of the benthic community carry the strongest weight in ERA as it is potentially the most relevant LOE to determine ecotoxicological effects. However, the Norwegian document suggests that the results from benthic community assessments are too easily confounded by other factors and should be used with care in a later case-specific third tier after risk has been determined with other LOEs. The US and Canadian documents agree with the Norwegian that other measurements should be considered if factors other than the contaminants might have disturbed the benthic community.

Human health
ERAs A, B, E and F compared contaminant concentrations in fish tissue to consumption guidelines in order to assess the risks to human health (Table 5). ERA A also compared the sediment concentrations to Dutch SQG for human health and E compared water concentrations to Swedish criteria for safe consumption. The Norwegian and Dutch documents suggest assessing human health in a second tier if a first tier indicates  Ã The tiers are based on whether or not the assessment is based on suspected but not confirmed risks (Tier 1) or if a risk is known but is yet to be quantified (Tier 2). $ In millions of US dollars at the exchange rate of 2019-04-10. ÃÃ ERA B detected contaminant concentrations in lake sediments equal to or lower than average concentrations in Baltic Sea sediments and concluded that there was a negligible risk. However, the concentrations were also compared to, and exceeded, SQGs. ‡Additional investigations were recommended as well as mitigation. potential risk. They prescribe modeling exposure via the food chain and dermal contact but none of the ERAs made such models. If needed, the Norwegian document suggests complementary field measurements to improve the models in a third tier. The Canadian document suggests modeling the bioavailability through the food chain in tier two and, similar to the Norwegian, to improve the models with additional sampling if needed in a third tier. The US document describes a similar approach, but sets the human health assessment separate from the ERA (Table 3).

Contaminant transport
To assess potential transport of contaminants, ERAs A, E and G measured resuspended material and A, E and F measured concentrations in the water of contaminants known to be present in the sediment. ERA A and E used their measurements to model the potential transport of contaminants to other areas. ERA D assessed the spread of known contaminants from the sediment into the water with passive samplers and compared these to Dutch guidelines for concentrations in the water (Table 5). ERA D described how the guideline values were not intended for assessing results from passive samplers but that the results could still be used as an indication. If a second tier is warranted, the Dutch and Norwegian strategies require that contaminant transport is assessed by modeling the transport from sediment to groundwater or surface water. Meanwhile, the US and Canadian strategies suggest assessing transport if there is a case-specific need to do so (Table 3). To characterize the risk of the contaminant transport, ERA A and E compare the spread of contaminants from the assessment site to that of other known sources into neighboring water bodies (Table 5). The Dutch strategy provides national criteria for characterizing the risk of the contaminants spread. Instead of providing criteria, the Norwegian strategy suggests setting case-specific benchmarks based on effects at the receiving site, while the Canadian and US strategies do not suggest any specific approach for how to characterize the risk of contaminant transport.
Biomarkers ERA E and G measured morphological changes in midge larvae, Chironomidae, and related these to contaminant exposure. ERA G also assessed morphological changes and EROD activity in fish (Table 5). The US strategy suggests measuring histopathological changes in infauna and the Canadian and Norwegian strategies suggest the use of casespecific biomarkers of in situ effects as an optional case specific LOE (Table 3).

Characterizing risk
Based on their measurements, five of the ERAs reached the conclusion that there was a higher than negligible risk; four of those recommended further action to manage the risks. ERA C argued that the risk of the contaminants spreading during dredging were too high and recommended that the oil contaminated sediment should be left undisturbed (Table 3). The integration of the individual results into these conclusions appear to be based on listing the results from the individual LOEs and the judgment of the assessors. An example is that the ERAs that used reference areas to characterize risk did not state at what level of difference from the reference there would be a risk or no risk (Table 5). None of the ERAs described or discussed weighing or scaling the LOEs nor any decision support system for how to integrate the results in order to reach their conclusions. The Canadian, Dutch and Norwegian strategies recommend that a tier one assessment should proceed to a second tier if the results deviate more than a set amount from prescribed assessment criteria (Table 3). The Norwegian strategy prescribes an additional set of assessment criteria for the second tier which if exceeded indicates a need for management. The Canadian and Dutch strategies provide matrices for integrating and interpreting the results from the second tier. The matrices offer different courses of action depending on the results (Table 2).

Discussion
We identified a lack of consistency between approaches to ERA of contaminated sediments in the analyzed set of ERAs performed in Sweden, confirming concerns from stakeholders on the lack of common guidelines or strategies (L€ andell et al. 2014(L€ andell et al. , Severin et al. 2018. It is likely that also other countries that lack common guidelines on ERA may face inconsistencies in how ERAs of contaminated sediment are performed. We also identified discrepancies between the set of established ERA strategies used as a reference framework and the performed ERAs. From comparing the ERAs to the reference framework, we identified three particular areas of concern. If addressed these areas have the potential to improve ERAs of contaminated sediments in countries that lack ERA guidelines: (1) Inconsistencies in the use of assessment criteria suitable for indicating risk.
(2) Infrequent assessments of site-specific ecotoxicological effects to enable correlation of contaminants and toxic effects.
(3) Unsystematic interpretation and integration of indications of risk into a weight of evidence. In addition to these areas, we discuss the potential of a more systematic approach to future societal demands and changing environmental conditions. Inconsistencies in the use of metrics suitable for indicating risk. Even though there will always be a level of subjectivity and uncertainty in ERA (Barnthouse 2008), we argue that there is a lack of standardization among the seven ERAs included in this case study. The approach to what methods to use in order to measure or characterize risk is inconsistent, which leads to unnecessary uncertainty and potentially unwarranted differences in conclusions of risk between cases. The ERAs use metrics for characterizing risks from contaminant concentrations from five different sources, including SQGs, background values and guidelines derived for soil, out of which only the SQGs are intended to indicate risks when compared to samples of sediment contaminants. Additionally, with no assessment criteria as benchmarks to estimate risk, ERAs A and E compare the rate of contaminant transport from the assessment site to adjacent waterbodies to the rate of transport from other sources to those waterbodies (Table 5).
While measurements of transport of contaminants might be used for assessing the risk to the receiving waterbodies, the results are difficult to use without any criteria that indicate the level of risk or effect. Such criteria could be standardized concentration criteria, as was prescribed by the Dutch strategy, or recommendations for measurements or models for assessing the risk of adverse effects from the transported contaminants.
One method to characterize risk is by comparing measurements to those from a reference site as is done by three of the ERAs: E, F and G in an assessment of the benthic communities. ERA F also used a reference site to assess bioaccumulation and G to compare biomarkers (Table 5). The Canadian framework suggests using reference sites/samples as an alternative for characterizing risk. The Dutch suggested using a reference site specifically for when assessing the benthic community (Table 3) and it is also discussed in the American framework. However, the use of reference sites, where contaminants are at 'background levels' and the ecosystem represents an 'ecological baseline' in ERA has been questioned.
Appropriate reference sites may not exist (or at least be difficult to find) in practice and other factors than contamination may cause the differences observed between sites (Landis and Wiegers 1997;Suter et al. 2000;Landis 2003;Kapustka 2008). Establishing whether contamination risks causing an effect may therefore not be possible, despite being critical to an ERA (Suter et al. 2000;Landis 2003). Instead, a gradient approach, utilizing the decrease in contaminant concentrations away from a local source, has been advocated as a more powerful method to establish cause and effect (Ellis and Schneider 1997;Preston 2002). This design also allows other environmental variables to be measured along the gradient and used in the analysis in order to control for their possible contribution as confounding (co-varying) variables and/or as additional stressors that affect ecological communities (Landis et al. 2011;Clements et al. 2012;Contaminated Sites Working Group 2020).
By using standardized metrics intended specifically for ERA of sediments, ERAs could be both more accurate and the risk estimates easier to communicate between stakeholders. The US strategy argues that failing to define appropriate benchmarks for risk early in an assessment can obstruct the understanding between stakeholders (Lotufo et al. 2014) and the communication between stakeholders is already reported to be difficult for sediment management in Sweden (L€ andell et al. 2014).
Infrequent assessments of site-specific ecotoxicological effects to enable correlation of contaminants and toxic effects. All of the analyzed ERAs assess environmental exposure to contaminants by measuring chemical concentrations and/or transport of chemicals. Assessments of site-specific ecotoxicological effects are less frequent. The ERAs A, E, F and G investigate the ecological status in situ by surveying specific benthic species or the community structure and only ERA G conducts a toxicity test in order to correlate contaminants and effects (Table 5).
Chemical analysis as conducted by the ERAs has a role to play, especially in an early phase with a screening for contaminants with a low threshold for potential risk, as suggested by the Canadian and Dutch strategies ( Figure 2). However, most of the ERAs included in this study knew from previous investigations that there was a potential for risk at the elevated levels of contaminants at the site (Table 1). An example of where additional LOEs assessing ecotoxicological effects could have been valuable is in ERA B, in which two different sets of concentration criteria were used and the conclusions drawn from the sets differed. The measured contaminant concentrations in lake sediments were below background concentrations in the adjacent Baltic Sea sediments, which led the assessors consider the risk as not elevated. However, the concentrations exceeded the Canadian SQG also used in the ERA.
In order to avoid over-or underestimation of the potential risk, assessing site-specific ecotoxicological effects can provide the correlation of contaminants and toxic effects. All of the frameworks recommend the assessment of LOEs such as bioaccumulation, biomarkers and surveying the benthic community, although the Norwegian framework in particular questions the predictive ability of benthic surveys, as they are sensitive to confounding factors. The use of toxicity tests accounts for the largest disparity in comparison to the ERAs. Toxicity tests are well established in recommendations for sediment ERA, not only in all of the frameworks addressed in this study (Table 3) but also in other frameworks, specifically for sediments but also in general for contaminated sites (e.g., Apitz 2011; WSDE 2013; Simpson and Batley 2016; Contaminated Sites Working Group 2020), but only ERA G conducted one.
It is possible that the lack of measurements of site-specific ecotoxicological effects is not based on a conscious choice of scientifically suitable methods for ERA of contaminated sediments but rather due to working within policies and using standards not originally intended for ERA of contaminated sediments. It is argued that experiences from ERA in terrestrial areas seem to have been a precedent in Sweden, setting the standard of practice in determining environmental risk by focusing on contaminant concentrations and exposure and human health (Bruce and Ohlsson 2020). However, when assessing sediments, a practice intended for terrestrial contamination based on general chemical criteria is arguably unsuitable. That is due to the inherent limitations in determining risk based on general chemical criteria, but also since there are currently no equivalents to the soil quality criteria derived for sediment in Sweden. Apitz (2008) argues that, for the EU, policies are putting undue focus on general contaminant concentration criteria in management of contaminated sediments.
We show how metrics not intended for ERA of contaminated sediments are used in estimating risk (Table 5), and standardized SQG, or a standardized system for deriving site specific SQG, for environments typical for the specific country (in this case Sweden) would be valuable. However, even with SQG developed for ERA, the practice of comparing measured contaminant concentrations to SQG still have multiple limitations due to the difficulty of including all relevant aspects in one generic SQG. Processes such as bioturbation, ingestion and diffusion make the bioavailability difficult to predict (Lepp€ anen and Kukkonen 2006; Leeuwen and Vermeire 2007;Mustaj€ arvi et al. 2019).
To account for unexpected contaminants, the US and Norwegian strategies suggest that toxicity tests should be conducted early in the ERA (Table 3). This might be crucial, as managers need to prioritize what contaminants to address in measurements and assessments. To test for all known chemicals is not feasible and even measured contaminants below benchmark values can pose unpredicted ecological effects when combined in a mixture with other contaminants (Hermens et al. 1984;O'Connor 2004;Apitz 2011;Laetz et al. 2015;Sobek et al. 2016). In, for example, the Baltic Sea, this risk may be pronounced due to high anthropogenic (diffuse) background levels of various contaminants from historical and ongoing contamination (Verta et al. 2007;Sobek et al. 2014). To assess the effects of contaminants effectively, multiple contaminants need to be taken into account, and for that, SQGs can be complemented with additional tools.
If new guidelines were to recommend tests of site-specific ecotoxicological effects, it would be important to set minimum standards so as to not risk underestimating environmental effects (Weeks and Comber, 2005). Depending on the design, biological tests may not be sensitive enough to display toxic effects that might occur in the field, e.g., due to short exposure times or use of resilient test species, or they may be subject to confounding factors. For example, the Norwegian strategy argues the benthic community structure should only be assessed in specific cases due to difficulties discerning the effect of contaminants in relation to other stressors. However, in some cases it is reasonable not to test for effects even when contaminant concentrations are elevated, for example, when costs for remediation are lower than the costs for further investigations and the negative environmental effects from the remediation would be negligible, as is pointed out in the Canadian strategy ( Figure 2).
Unsystematic interpretation and integration of indications of risk into a weight of evidence. While the ERAs clearly describe the collection of the various indications of risk, the rest is not always as clear. The ERAs rarely discuss the uncertainties of their results, none of them describes their process of risk calculation and it is not always clear that the results meet the, sometimes quite broad, assessment objectives. This is perhaps especially important for the ERAs that feed into the development of a management plans (Table  1). Clear assessment objectives are preferably set early on and the combination of LOEs that are best suited to assess the objectives should be chosen together with a structured and transparent approach to interpret and integrate the LOEs (see e.g., Barnthouse 2008). If done properly this can aid the assessors in objectively and transparently determining the level of risk and uncertainty as well as aid in stakeholder communication and acceptance (see e.g., Grapentine et al., 2002;Chapman and Smith, 2012). There are several standardized approaches available for combining results into a weight of evidence, including quantitatively integrating results based on predetermined ranking of the LOEs, statistical models or structured formal decision analysis and more (Bettinger et al. 1995;Linkov et al. 2009;Hardy et al. 2017;Contaminated Sites Working Group 2020).
Future societal demands and changing environmental conditions. Both the Swedish ERAs and the strategy documents focus on current site conditions when assessing risk, potentially not considering future conditions that could be important for long-term management. For example, ERAs A, D, E and F assessed the rate of sedimentation, suspended sediment and sediment-associated contaminants in the surface water based on the conditions at the time (Table 5), which is also recommended by the Dutch and Norwegian strategies (Table 3). While such measurements might provide important information that information is sensitive to future changes in the environmental conditions, such as acidity, increased rainfall due to climate change or increases in boat traffic. It has been argued that ERAs in general, not just of sediments, are too focused on assessing a narrow set of conditions and only a snapshot in time (Selck et al. 2017).

Conclusion and recommendations
Healthy sediments are a prerequisite in order to meet several societal needs and environmental goals, but sediment risk assessment is a challenging and complex process. We identify four areas that, if addressed, can help improve the practice of ERA of contaminated sediments: (1) Criteria used to indicate or quantify risk should be standardized. (2) A common strategy for how and when to conduct ecotoxicological tests should be established. (3) Transparent and systematic methods for how to integrate and interpret indications of risk should be required. (4) Future societal demands and changing environmental conditions that could be important to account for in the ERA need to be included, to improve the ability to provide a basis for sustainable long-term management.
A guidance document covering the four areas identified in this study would provide a common basis between stakeholders on what is expected from an ERA to assure high quality, and facilitate the often long and rather complicated process from site investigations to sediment remediation.