Development of a predictive model for benthic macroinvertebrates by using environmental variables for the biological assessment of Korean streams

Abstract Predictive models for benthic macroinvertebrates based on changes in environmental variables can assess the biological integrity of streams by comparing observed biotic communities with those expected at reference sites. To develop a predictive model of the abiotic community, we used benthic macroinvertebrates and environmental variables collected from 2,700 sites from 2010 to 2019. First, we selected 357 reference sites by using the 5-day biochemical oxygen demand, ammonia nitrogen, total phosphorus, turbidity, and coarse particle percentage. Then, we used Two-Way Indicator Species Analysis to classify the reference sites into six groups based on benthic macroinvertebrates. Reference sites classified by biological characteristics were linked to environmental variables by multi-discriminant analysis. The relative influences environmental variables on the classified groups were in decreasing order of catchment area, latitude, velocity, water depth, altitude, and longitude. To develop the predictive model, we combined (1) identification level, (2) grouping method, and (3) probability of capture, and then used the normalized root mean square error (NRMSE) to check the fit of each model. The higher the probability of capture was at the family level compared to the species level, the lower was the NRMSE. The grouping method was not as consistent as the identification level and probability of capture because the NRMSE for the number of taxa was low when used as a weighted average. The NRMSE was also low for the Benthic Macroinvertebrates Index and the Benthic Macroinvertebrates Family-level Biotic Index (BMFI) when used for assignment to the group with the highest probability. We selected the predictive model which used family level, weighted average, and BMFI-proposed indicator taxa as the final assessment model due to its sensitivity and fit. This model was the most reasonable choice, but we had to reduce the error of the model and revise it elaborately by securing additional environmental variables.


Introduction
Stream and river ecosystems are rich in biodiversity relative to the amount of the Earth's surface they occupy (Dudgeon et al. 2006;Strayer and Dudgeon 2010), but they are also habitats where many artificial disturbances occur concurrently (Strayer 2006;Vaughn 2010). Increasing human demand for water and the land use leads to disturbances in water quality, hydrology, and riparian vegetation and area, which changes the habitats of aquatic organisms and deteriorates ecological integrity, thereby reducing the biodiversity of stream ecosystems (Nakamura et al. 2006;De Silva et al. 2007;Cuffney et al. 2010). For this reason, information on biological and water quality is essential for comprehensively assessing the environmental conditions of streams. Biological monitoring is currently being done in several countries, and biological assessment methods are being developed based on accumulated data (Cox et al. 1997;Smith et al. 1999;WFD 2000;USEPA 2013a;Arle et al. 2014). Biological assessment is the basis for restoring ecological integrity, as it monitors changes in aquatic communities which are caused by artificial and inferring disturbances in the monitored areas to characterize the current state of stream ecosystems (USEPA 1999;Lee et al. 2011;Jun et al. 2012). There is a diverse array of organisms used for biological assessment of streams, including fish, benthic macroinvertebrates, and periphytons (Karr 1981;USEPA 1999;Klemm et al. 2002;Hill et al. 2003;Roset et al. 2007). Among them, benthic macroinvertebrates have long been selected as indicator organisms for the biological assessment of streams due to their species diversity, long lifespan, bottom-dwelling activity, and sensitivity to habitat disturbance, and they have consequently been used for short-and long-term monitoring of stream environments (Rosenberg et al. 1986;Resh and Jackson 1993;Karr 1999;Smith et al. 1999;Resh 2008;G€ ultekin et al. 2017).
Over the past few decades, researchers studying freshwater ecosystems have developed various biological assessment methods using benthic macroinvertebrates to assess the ecological influence of disturbance and restore integrity to the ecosystem Lenat 1993;Karr 1999;Smith et al. 1999;USEPA 1999;Clarke et al. 2003;Hargett et al. 2007;Arle et al. 2014;G€ ultekin et al. 2019). Biological assessment can be categorized into two main approaches. One is a multimetric approach that uses several biological indices to assess different ecological conditions of streams. This method was initially developed by applying the concept of indicator species to assess changes in specific environmental variables, but because overall ecological integrity cannot be assessed with a single or partial index, indices that can detect changes in various environmental variables have been developed and used in combination (Metcalfe 1989;WFD 2000;AQEM Consortium 2002;Jun et al. 2012;USEPA 2013a;Arle et al. 2014;Fierro et al. 2018). The other biological assessment approach is one which compares predicted and observed fauna by using an empirical model based on the occurrence of organisms in a reference stream with minimal disturbance. These predictive models have been applied to biological assessment of streams in the United Kingdom, Australia, and Mediterranean countries in Europe (Norris and Norris 1995;Smith et al. 1999;Wright et al. 2000;Clarke et al. 2003;Poquet et al. 2009). There have also been attempts to develop predictive models for specific states or streams in the United States, which generally uses multimetric methods nationwide Hargett et al. 2007).
The River Invertebrate Prediction and Classification System (RIVPACS) is the first predictive model which can detect and interpret artificial disturbances of benthic macroinvertebrate communities in streams. Biological conditions can be assessed by comparing the observed fauna at the sampling site (test site) with the fauna expected in the absence of disturbance (Moss et al. 1987;Wright 1994;Wright et al. 2000;Clarke et al. 2003). The prediction depends on the observed fauna and environmental variables in the reference stream. The assessment is carried out using the ratio of expected (E) and observed (O) fauna (O/E) at the sampling site; the lower the O/E value, the more disturbed the stream (Wright et al. 2000;Clarke et al. 2003). Since the O/E ratio predicts the composition of taxa at the sampling site, it provides information on the presence or absence of specific taxa (Wright et al. 2000). If some taxa which are sensitive to disturbance are predicted, this information can lead to the inference and diagnosis of the stressor most likely to affect the site . The O/E ratio not only assesses the number of taxa and individual abundance, but it also assesses a separate biological index which is developed to measure the disturbance of a specific environmental variable, regardless of the outcome of the predictive model (Clarke et al. 2011). Thus, it is possible to try to assess streams by using indices which represent multiple environmental variables. Furthermore, the Australian River Assessment System (AUSRIVAS) is another predictive system which is used to assess the biological health of Australian rivers. It was developed by the Australian government to maintain and protect the ecological quality of Australian rivers, and it has been used as part of their National River Health Program (Schofield and Davies 1996). AUSRIVAS is mostly based on the methods of RIVPACS, but it differs in that it uses only the family level, and microhabitats are individually sampled and applied to the model (Smith et al. 1999).
Over the past 30 years in the Republic of Korea, several indices using benthic macroinvertebrates have been developed to assess stream ecosystems. Early attempts were the Total Biotic Score (Yoon et al. 1992a) and the Group Pollution Index (Yoon et al., 1992b), which improved the methods of Zelinka and Marvan (1961) to make them suitable for stream assessment in the Republic of Korea. Since then, the number of indicator taxa increased based on the accumulated data, and new analysis results were synthesized. The Korean Saprobic Index (Won et al. 2006), Benthic Macroinvertebrates Index (BMI; Kong et al. 2018), and Benthic Macroinvertebrates Family-level Biotic Index (BMFI; Kong et al. 2019) have since been proposed. These biological indices created an opportunity to convert water-quality assessments on the state of streams in the Republic of Korea into biological assessments. However, since these are generally 5-day biochemical oxygen demand (BOD 5 )-based indices, they can only assess the influence of certain environmental variables such as water chemistry, and it may be difficult to use these indices to assess the overall integrity of stream ecosystems. For this reason, Jun et al. (2012) proposed a multimetric method for assessing the integrity of streams in the Republic of Korea, but so far there have not been any efforts to develop and apply predictive models such as RIVPACS and AUSRIVAS. Therefore, in this study, we aimed to develop a predictive model based on data collected over a long period of time in the Republic of Korea and to assess the suitability of the developed model. In this study, we chose reference sites based on certain physiochemical variables, and we then derived predictive models. For the derived models, we decided which criteria created the most suitable model for assessment: (1) the identification level of taxa (species or family), (2) the group allocation method of the test sites, or (3) the level of capture probability for each taxon.

Study area
We conducted this study on the main streams, tributaries, and other independent streams of five major river watersheds (Han, Nakdong, Geum, Yeongsan, and Seomjin rivers) on a national scale in the Republic of Korea. A detailed description of the characteristics of each watershed is given in Hwang et al. (2011) andJun et al. (2016).
The Republic of Korea is located between 37 00 0 N and 127 30 0 E (eastern side of the mid-latitude continent) and is influenced by continents and oceans, resulting in four distinct seasons. The total area of the Republic of Korea is approximately 100,339 km 2 , and mountainous areas with low altitude occupy about 60% of the total area (Kim et al. 2017). High mountains with steep slopes are located mainly in the eastern region, whereas lowland plains with gentle slopes are located in the western region as the elevation decreases Li et al. 2012). According to these geographical features, mountainous streams with high or low altitude occupy a larger area than lowland plain streams (Bae et al. 2003).
In the Republic of Korea, precipitation in the summer monsoon season occurs based on geographical location. It accounts for approximately 60-70% of annual precipitation and can result in frequent flooding depending on the strength and frequency of precipitation. In other seasons, the base streamflow is maintained or is below average. As a result of this, the water width drastically changes, and the coefficient of the river regime is higher than that in other countries Jun et al. 2016).

Data collection
In this study, we used the results of 2,700 sampling sites (15,936 sampling units) by the Nationwide Aquatic Ecological Monitoring Program (NAEMP; Lee et al. 2011) conducted by the Ministry of Environment and the National Institute of Environmental Research from 2010 to 2019 (Figure 1). We recorded environmental variables and sampled benthic macroinvertebrates in accordance with NIER (2019).
We measured environmental variables at each sampling site before sampling the benthic macroinvertebrates. We measured a total of 13 environmental variables, which were classified into four categories: stream size, geographical features, microhabitat, and water chemistry (Table 1). We classified stream order in terms of stream size by using a 1:120,000-scale map by applying the criteria in Strahler (1957), and we calculated the catchment area (km 2 ) by using the Korean Reach File provided by NIER (2015b). We measured stream width (m) as the distance from bank to bank by using a laser range finder (Bushnell Inc., Overland Park, KS, USA). We measured altitude (m), latitude, and longitude of the geographical features of each stream by using GPS (Triton 500, Magellen Inc., San Dimas, CA, USA). All microhabitat variables were measured at three sampling points per sampling site and then averaged. We obtained velocity (cm/s) by using a flow meter (Flow-Mate Model 2000 or 3000-LX, Swoffer Instruments, Inc., Tukwila, WA, USA) or by applying the Craig method (Craig 1987). We calculated water depth (cm) by measuring the vertical distance from the water surface to the streambed. Substrates were classified into five types (boulder, cobble, pebble, gravel, and sand) according to the criteria of Cummins (1962), and the proportion of the area occupied by each substrate type was estimated by naked eye at each sampling point. We regarded substrates larger than sand (greater than 2 mm) as coarse particles (USEPA 2013a) and obtained the fraction of coarse particles (%) at each sampling point. Among the variables corresponding to water chemistry, we measured turbidity (NTU) by using a multi-probe portable meter (YSI 6920, YSI, Inc., Yellow Springs, OH, USA or Horiba U-22XD, Kyoto, Japan), and we and obtained the BOD 5 (mg/L), ammonia nitrogen concentration (NH 3 -N; mg/L), and total phosphorus concentration (TP; mg/L) using standard methods (APHA 2001) with stream water collected at the sampling sites.
We collected benthic macroinvertebrates mainly from riffle habitats within 100 m from each sampling site by using a Surber sampler (30 cm Â 30 cm, 1 mm mesh). In larger streams, benthic macroinvertebrates were collected from run or pool habitats at sites with no riffles or at sites that were difficult to survey. We conducted quantitative sampling by collecting from three replicates (0.27 m 2 of habitat for each sampling unit) in the spring and autumn at each site (some locations were sampled only once, either in spring or in autumn), before and after the monsoon flood season. Samples collected by three replicates were placed in one collection bottle, fixed with 95% ethanol, and then transferred to the laboratory, where the organisms were sorted from detritus and inorganic material and eventually stored in 80% ethanol. All specimens were identified at the species level if  possible, but some taxa with limited taxonomic information (e.g., Oligochaeta, Hemiptera, Coleoptera, and Diptera) were identified at the genus or family level (Yoon 1995;Kwon et al. 2001). All collected individuals for each taxon were counted and the individual abundance was calculated as individual abundance/m 2 .

Data analysis
Predictive models such as RIVPACS and AUSRIVAS use taxa and environmental features from a series of sampling sites (reference sites) of high biological quality to predict organisms at given sites. Although theoretically there is no artificial disturbance at the reference sites, this is often not the case in reality (Stoddard et al. 2006). Therefore, we had to identify the reference sites that were the least disturbed which means that water pollution of the habitat is low and the heterogeneity of the microhabitat is high (Hargett et al. 2007). Considering this, we selected the reference points based on BOD 5 (organic pollution), NH 3 -N (toxicity), TP (nutrient state), turbidity (inorganic pollution), and the proportion of coarse particles (microhabitat heterogeneity). When consistent criteria are applied for the selection of reference points, mountainous streams at high altitudes may have less disturbed sites than large streams and rivers where all disturbance variables are aggregated may have few reference sites (Herlihy et al. 2008). Therefore, as shown in Table 2, we selected reference sites by setting different standards for NH 3 -N, TP, and turbidity in small streams (Types 1-3) and large streams (Types 4-6) based on the stream types of Min et al. (2019), and we confirmed the number of reference sites based on these types. Min et al. (2019) classified Korean streams into six types by using catchment area and altitude: Type 1 is a mountain stream (in order of catchment area and altitude, < 250 km 2 and > 450 m), Type 2 is a highland small stream (< 250 km 2 and 150-450 m), Type 3 is a lowland small stream (< 250 km 2 and < 150 m), Type 4 is a highland large stream (250-4,000 km 2 and ! 150 m), Type 5 is a lowland large stream (250-4,000 km 2 and < 150 m), and Type 6 is a river (> 4,000 km 2 , regardless of altitude).
We did hierarchical clustering analysis using Two-Way Indicator Species Analysis (TWINSPAN) to group the selected reference sites based on the similarity of their benthic macroinvertebrate communities. The results of clustering analysis differed depending on the algorithm used for the analysis and on the frequency of occurrence and individual abundance of the studied taxa. For identification, the species level was used in order to consider taxa with different ecological characteristics for each species in the same family. In addition, the individual abundance of all observed taxa was adjusted using Equation 1, which Min et al. (2019) used to downweigh the effect of dominant taxa for a particular taxon (e.g., Oligochaeta, Chironomidae, or Baetidae; Hwang et al. 2011;Jun et al. 2016 ), and the range of individual abundance was converted equally. In the following equation, n, n min , and n mam represent the individual abundance, minimum individual abundance, and maximum individual abundance for each taxon, respectively. The constant 0.5 is given to distinguish between when a taxon has not occurred and when the value of n is equal to n min : Initially, RIVPACS only considered the presence or absence of taxa, but recently the model has also considered individual abundance (Wright et al. 2000;Clarke et al. 2003). To find out which main taxa influenced the grouping of reference sites, we used the method in Dufrêne and Legendre (1997) to identify which taxa had the highest frequency of occurrence and individual abundance in each group. We used one-way ANOVA to confirm whether the difference in environmental variables for each TWINSPAN group was statistically significant.
We linked TWINSPAN reference groups defined by biological properties to environmental features by using stepwise multi-discriminant analysis (MDA; Klecka 1975), a multivariate statistical method. MDA finds a linear combination that maximizes the variance-covariance matrix between groups and minimizes the variance-covariance matrix within a group, and it measures the relative influence of environmental variables by using the discriminant function to separate the groups. The discriminant function is derived from the linear combination of the discriminant variables (environmental variables), discriminant coefficients, and discriminant constant based on the influence of discriminant variables, which play an important role in discriminating the predefined reference groups. Subsequently, the values of the environmental variables of the test site shall be substituted for the discriminant functions to determine which group it belongs to. At this time, the number of discriminant functions for discriminant analysis is estimated to be smaller than the number of reference groups minus one or the number of discriminant variables. The environmental variables we used in MDA except for the variables used to select reference sites were stream order, catchment area, stream width, altitude, latitude, longitude, water depth, and velocity. The catchment area, stream width, altitude, water depth, and velocity were converted to lnðx þ 1Þ to follow a normal distribution.
We chose the environmental variables (predictive variables) useful for distinguishing each TWINSPAN group and the discriminant functions to be applied to the models in order to obtain an assigned probability for each TWINSPAN group of test sites by using the Wilk's lambda derived from discriminant analysis (significance level: p < 0.05). Using the predictive variables and discriminant functions derived by MDA, we predicted the assigned probability of each TWINSPAN group, the probability of capture for each taxon, and the individual abundance at each test site according to the described in Clarke et al. (2003) and Moss et al. (1987). Here, we compared (1) the methods assigned to each TWINSPAN group of test sites, (2) the taxonomic identification level of observed taxa, and (3) the probability of capture for each taxon in relation to the fit of the models. The test sites were assigned to groups by comparing their weighted averages (applied using RIVPACS or AUSRIVAS) and assigning them to the groups with the highest probabilities (attempted by . The taxonomic level of the used taxa compared the number of species-level taxa and the BMI (a species-level assessment index) with the number of family-level taxa and the BMFI (a family-level assessment index). The probability of capture for each taxon was compared using only the indicator taxa proposed in BMI or BMFI and the following probabilities: P 0 (p > 0), P 25 (p ! 0.25), P 50 (p ! 0.50), and P 75 (p ! 0.75). Using the normalized root mean square error (NRMSE; Zhao et al. 2020) for the expected value and the observed value, we determined the fit of the models; the lower the NRMSE value, the higher was the fit of the model.

Selection of reference sites
We selected 357 reference sites according to the criteria in Table 2 (Table 3). Type 3 had the greatest number of selected reference sites (120 sites), but in terms of the relative ratio of the number of selected reference sites to the total number of sites by stream type, it was the lowest. Type 4 had the highest percentage, although the number of reference sites selected was relatively low (38 sites). Since Type 1 had the fewest sites (45 out of 2,700 sampling sites), the number of selected reference sites was the least (8 sites). For each of the environmental variables for each type of reference site, there were differences in stream size and altitude (Table 3): Types 1-3 were small-scale streams, Types 4-6 were large-scale, Types 1, 2, and 4 were high-altitude, and Types 3, 5, and 6 were low-altitude. Latitude, longitude, velocity, and coarse particle percentage tended to decrease with decreasing altitude, and water depth, NH 3 -N, TP, and turbidity tended to increase with increasing stream size according to the reference condition criteria. Accordingly, physical variables tended to change according to altitude and water-quality variables for each category of stream size, all of which significantly differed based on one-way ANOVA. BOD 5 showed a tendency to increase with decreasing altitude and increasing stream size, but there was no statistically significant difference. Most of the sites excluded from reference site selection did not meet the criteria of NH 3 -N, TP, or turbidity, and most of the sites of Type 5 and 6 did not meet the criteria for coarse particle percentage or water quality; however, most sites met the criteria for BOD 5 , which had a consistent standard compared to the other environmental variables. Among these criteria, the effects of NH 3 -N and TP were greatest. Hence, the overall organic pollution of streams in the Republic of Korea was low, but the trophic state was high (Min and Kong 2020).

Classification of reference sites into benthic macroinvertebrate communities
Based on the adjusted individual abundance of benthic macroinvertebrates, we used TWINSPAN to divide the 357 reference sites (division level one) into two groups at division level two, which were then subdivided into six groups at division level three (Groups 1-4 and 5-6; Figure 2).
When we looked at the sites in each group in terms of the watersheds and stream types, the sites of Group 1 and 2 were mainly located in the Han and Nakdong Rivers. The number of sites of Type 1 was the same in Groups 1 and 2, whereas there were more sites of Type 2 in Group 2. Most sites in Group 3 were located on the Han River, and Types 4 and 5 were dominant; of these, there were slightly more sites of Type 4. The sites    in Group 4 were generally evenly located in the four watersheds except for the Youngsan River, and Type 5 sites were dominant. The sites in Group 5 were mainly located on the Nakdong and Seomjin Rivers, with Type 2 and 3 constituting the most sites. Most of the sites in Group 6 were located on the Nakdonggang River, and site Types 5 and 6 were the most dominant with sites of Type 6 being more common.
For each group, we selected the taxa which had the highest frequency of occurrence and individual abundance: 56 taxa from Group 1, 6 taxa from Group 2, 65 taxa from Group 3, 41 taxa from Group 4, 6 taxa from Group 5, and 58 taxa from Group 6 ( Table 4). We judged that these taxa had a major influence on the inclusion of sites in each group. The main taxa in Group 1 were Ameletidae, Heptageniidae, Ephemerellidae, Plecoptera, Limoniidae, Pediciidae, Simuliidae, Ceratopogonidae, Blephariceridae, Athericidae, Rhyacophilidae, Hydrobiosidae, Glossosomatidae, Philopotamidae, Limnephilidae, Uenoidae, and Odontoceridae, which inhabited upper streams at high altitudes. The main taxa in Group 3 were Baetidae, Isonychiidae, Heptageniidae, Leptophlebiidae, Plecoptera, Corydalidae, Tipulidae, Rhyacophilidae, Hydropsychidae, and Apataniidae, which preferred water with high velocity and inhabited upper and middle streams. The main taxa in Group 4 were Mollusca, Potamanthidae, Gomphidae, Elmidae, Psephenidae, and Hydropsychidae, which preferred habitats with coarse particles, although their preferred velocity was relatively slower than those of Groups 1 and 3. The main taxa in Group 6 were Mollusca, Annelida, Malacostraca, Baetidae, Coenagrionidae, Platycnemididae, Calopterygidae, Libellulidae, Hemiptera, and Hydrophilidae, which preferred water with slow velocity. Groups 2 and 5 had more sites than did other groups, but the number of taxa selected based on their frequency of occurrence and individual abundance was few. Overall, most of the taxa with high  frequency of occurrence and individual abundance in Group 2 were the same as those in Groups 1 and 3, and the taxa with high frequency of occurrence and individual abundance in Group 5 were the same as those in Groups 3, 4, and 6. Accordingly, the sites in Groups 2 and 5 were considered to be in streams that transitioned from Group 1 to Group 3 and from Group 4 to Group 6, respectively. There were statistically significant differences for all environmental variables in each group according to the one-way ANOVA (Figure 3). Based on the mean values of each environmental variable, the stream size in Group 1 (in order of stream order, catchment area (km 2 ), and stream width (m): 2.1, 130.6, 57.0) was the smallest, followed by Groups 5 (2.7, 103.6, 57.0) and 2 (2.8, 142.8, 58.3), Groups 3 (4.1, 798.1, 155.4) and 4 (4.3, 1097.4, 160.5), and Group 6 (4.3, 3194.5, 220.6), which was the largest. The altitude was highest in Group 1 (326.5 m) and decreased in the order of Group 2 (218.4 m), 3 (184.8 m), 5 (151.4 m), 4 (114.1 m), and 6 (84.0 m). Since latitude and longitude varied depending on the location of the sites, most of the sites with the highest geographic coordinates were in Group 3 (in order of latitude and longitude: 37.2733, 128.4237) located in the Han River, fewer were in Groups 1 (36.7145, 128.2784), 2 (36.3351, 128.2023), and 6 (36.1571, 128.0914), and the least were in Groups 4 (36.0782, 127.7993) and 5 (35.8993, 127.8667), where the sites were evenly distributed throughout the watersheds. There was little difference in water depth from Groups 1 to 5 (28.0, 26.1, 27.9, 28.7, and 27.2 cm, respectively), but Group 6 had a greater water depth (40.1 cm) than any of the other groups. Like altitude, velocity tended to decrease in the order of Groups 1 to 6 (54.5, 51.0, 52.9, 37.4, 34.1, and 18.4 cm/s, respectively). The percentage of coarse particles also showed a tendency similar to that of the altitude and velocity, as it tended to decrease in the order of Groups 1 to 6 (92.3, 91.8, 90.1, 85.5, 84.4, and 74.7%, respectively); however, when used as a reference site criterion, the percentage of coarse particles was high in all groups. Among the water-chemistry variables, which were used along with the percentage of coarse particles as the criteria for the selection of reference sites, BOD 5 increased in order of Groups 1 (0.9 mg/L), 3 (0.9 mg/L), 4 (1.0 mg/L), 2 (1.2 mg/L),  5 (1.3 mg/L), and 6 (1.4 mg/L), and it was not related to river size, altitude, latitude, longitude, or organic pollution level. NH 3 -N, TP, and turbidity were lower in Groups 1 (in order of NH 3 -N (mg/L), TP (mg/L), turbidity (NTU): 0.028, 0.014, 2.0), 2 (0.022, 0.013, 1.9), and 5 (0.020, 0.014, 2.5) and higher in Groups 3 (0.043, 0.019, 4.9), 4 (0.048, 0.023, 4.4), and 6 (0.041, 0.020, 4.9). The larger the stream size, the more relaxed were the applied criteria.

Selection of predictive variables and functions by discriminant analysis
Each TWINSPAN group of reference sites classified by biological characteristics was linked to the environmental properties of the group using MDA. The environmental factors used in the discriminant analysis were river order, basin area, river width, latitude, longitude, depth, and flow rate. We used the discriminant variables selected by MDA as the predictive variables and the derived discriminant functions as the predictive equations. The analysis showed that five of the seven environmental variables were highly discriminating for the distinguishment of each group, but stream order and width were not statistically significant and were excluded as predictive variables (Table 5). Among the estimated environmental variables, the predictor with the highest discriminant power was the catchment area, and there was a decrease in Wilks' lambda as latitude, velocity, water depth, altitude, and longitude were added. Since six TWINSPAN groups were used for discriminant analysis, a total of five discriminant functions were estimated. Of these, discriminant functions 1 and 2 had low Wilks' lambda and were statistically significant, whereas discriminant functions 4 and 5 had a higher Wilks' lambda and were not statistically significant. The Wilks' lambda for discriminant function 3 was high but was statistically significant. For this reason, we selected discriminant functions 1-3 as the prediction equations.
In order to understand the environmental properties of the TWINSPAN groups according to the discriminant analysis, we confirmed the distribution of each group by using centroids of canonical discrimination functions 1 and 2, in which Wilks' lambda was the lowest (Figure 4). The canonical discrimination functions 1 and 2 accounted for 58.0% and 31.1% of the variance, respectively, explaining most of the results. Furthermore, we confirmed the relative importance of the environmental variables by using the structure matrix rather than the standard canonical discrimination function coefficient, as it considered the multicollinearity between variables. Canonical discriminant function 1 was influenced by water depth, but groups were mainly classified according to the catchment area. Groups 1, 2, and 5 with small scale were located on the negative side of the catchment area, and Groups 3, 4, and 6 with large scale were located on the positive side. The canonical discrimination function 2 was also influenced by velocity, but it seemed to reflect geographic features by classifying groups according to latitude, altitude, and longitude. Thus, Groups 1, 2, and 5 at relatively high altitudes were located on the positive side, and Groups 3, 4, and 6, which were low, were located on the negative side. Table 5. Selection of the environmental variables that distinguished each TWINSPAN group of reference sites, the relative importance of each variable, and the estimated discriminant functions resulting from stepwise multidiscriminant analysis. Step Variables added at each step Wilks' lambda F p -value

Prediction of reference sites by using discriminant variables, discriminant functions, and O/E values
Based on the predictive variables and functions derived from discriminant analysis, we calculated the expected values for the reference sites and confirmed the fitness of the models by using NRMSE and comparing with the observed values. We derived the NRMSE of the number of taxa and assessment indices (BMI and BMFI) for (1) identification level, (2) grouping method, and (3) probability of capture level (including only the used indicator taxa proposed in BMI and BMFI), and we considered which model was the most reasonable (Table 6). First, when identification level and grouping method were the same, the NRMSE was lower in terms of probability of capture, followed by P 75 , P 50 , P 25 , indicator taxa, and P 0 ; in other words, the more likely a few taxa with high probabilities of capture were to be used, the higher was the fit of the model. BMI showed different results depending on the grouping method or identification level, and the NRMSE of BMFI decreased in the order of P 75 , P 50 , P 25 , indicator taxa, and P 0 , regardless of the grouping method; this was the same as the trend for the number of taxa. When the other conditions were the same and only the difference in the grouping method was considered, the number of taxa had a low NRMSE in the weighted average. When belonging to a specific group, the NRMSE was generally low in BMI and BMFI. Similarly, when only the identification level was considered, the number of taxa had a low NRMSE at the family level, and BMI and BMFI also had a lower NRMSE of BMFI. Taken together, both the number of taxa and the assessment indices had the highest fit for the model, which used a few taxa with a higher probability of capture rather than many taxa with a lower probability of capture, and used the family level rather than the species level. When we used the weighted average, the fitness was high for the number of taxa, and when belonging to a specific group, the fitness of the assessment indices was also high. However, if only a few taxa with a high probability of capture were used, the number of excluded taxa was larger than the number of taxa used to calculate the O/E value; in this case, the test sites could have been underestimated, even if the model's fit was improved. Considering this, it was better to use as many taxa as possible, although this may have decreased the suitability of the model. In this study, there were 169 families, and although the BMFI indicator taxa included 97 families, information from 72 families was unavailable overall. The ratio of the number of indicator taxa to the number of P 0 taxa at each site showed no large differences of 0.85 or higher, up to the 5 th percentile. We excluded 72 families (some Odonata, Hemiptera, Coleoptera, and Diptera), which mainly inhabited pools or riparian zones, from the selected taxa because they could not derive indicator values given their low frequency of occurrence and individual abundance. Their infrequent appearance was probably due to the riffles in the sampled habitat rather than due to individual abundance. Given this, we used the indicator taxa of BMFI as the list of expected taxa, as this group mostly included families with a frequent appearance in Korean streams and had a lower NRMSE than when the probability of capture was P 0 . The NRMSE for the observed values and the expected values based on the BMFI indicator taxa showed trends opposite to those of the number of taxa and the BMFI according to the grouping method. In other words, the NRMSE was lower for the weighted average of the number of taxa, and it was lower in the BMFI when belonging to the group with the highest probability. In this case, since the NRMSE for the BMFI was low ( 0.130) for both methods, we preferred reducing the NRMSE value from 0.400 to 0.368 to increase the fit of the model over reducing the NRMSE value from 0.122 to 0.119. Finally, we selected a model that is best suited for the Republic of Korea at this time by using (1) family level, (2) weighted average, and (3) the expected taxa list based on the BMFI indicator taxa. For the selected model, the distributions for the number of taxa of reference sites, overall sites, and BMFI O/E values showed that the number of taxa was more dispersed than the BMFI, but in both cases, only one mode occurred near 1 ( Figure 5). In addition, the overall sites had more sites with O/E values less than one than did the reference sites, but the number of sites evaluated from 0.75 to 1.25 was more than 60% for the number of taxa and 80% for BMFI, meaning that there was no large difference compared to the reference sites. It seems that the variance of BMFI was lower than that of the number of taxa because of the characteristics of the index. BMFI is calculated by dividing the rank of individual abundance of each taxon at the site into five levels (see Zelinka and Marvan 1961) and then reflecting the sensitivity value assigned to each taxon; values closer to 100 represent lower amounts of organic pollution. Although many sensitive taxa existed at the reference site, if one or two sensitive taxa had a high individual abundance despite being absent at the test site, there would have been a higher index value than in the absence of the taxa; hence, the O/E of the BMFI may be higher than that of the number of taxa.

Criteria for reference sites
The biological assessment of streams is important for understanding the effect of artificial disturbance on river ecosystems and the degree of disturbance. A reference stream can be a standard for the current condition of a water body, and it can provide criteria for assessment and for river restoration and management (Stoddard et al. 2006;S anchez-Montoya et al. 2009). In other words, reference streams play an important role in revealing the effects of human disturbances and can provide objective criteria for biological assessment (Nijboer et al. 2004;Pardo et al. 2012). Therefore, reference streams are helpful for quantitatively understanding the degree of human disturbance or the damage caused by stream use. In addition, reference streams can also contribute to the development of an index that can reflect biological integrity based on information about environmental and ecological characteristics. Since stream environments continue to change at temporal and spatial scales, the United States and Europe have applied a reference condition approach to compare the physical, chemical, and biological characteristics of reference and target streams USEPA 1999;Chaves et al. 2006). For this reason, research and policy applications of reference streams have been actively conducted since the 1970s.
The reference condition is a condition for selecting reference streams and a criterion for assessing the current condition of target streams. Conceptually, it is ideal to establish a reference condition by analyzing ecological properties from reference streams without artificial disturbance, but it is unrealistic to find intact natural streams that are absolutely undamaged. Reference conditions can be defined differently depending on the degree of disturbance in the target region, and this can lead to varied results when they are assessed as differences in biological integrity (USEPA 2006). Thus, it is necessary to select reference streams by considering the current biological conditions, which can be done on a national or regional scale. The United States and Europe have grouped streams with similar environmental conditions mainly by assessing geographical features or ecoregions, and then they have used a regional reference condition that defines conditions with no or slight disturbance depending on the ecoregion (Reynoldson et al. 1997; The EP and the Council of the EU 2000; Soranno et al. 2011;USEPA 2013a). In addition, Bailey et al. (1998) defined reference streams as those exposed to minimal disturbance or as streams in which the biotic community and natural environment are preserved. Stoddard et al. (2006) described reference streams as those with minimal disturbance even if they are affected by humans. Just as the definition of reference streams differs, the environmental variables used as reference conditions also differ in each study. For example, Herlihy et al. (2008) selected the least disturbed sites by using nine physical and chemical variables (e.g., TP, total nitrogen concentration [TN], turbidity, and fine substrate). USEPA (2013a) modified some of the variables applied in Herlihy et al. (2008) to establish reference conditions for nine ecoregions by using geographic information system (GIS)-based data (e.g., urbanization and agricultural effects). Klemm et al. (2002) selected reference streams by setting criteria in three categories (chemical, biological, and habitat). Johnson et al. (2013) applied water-quality variables and GIS data as reference condition criteria, focusing on the biological variables used in the WFD (2000) water-quality assessment.
In the Republic of Korea, Jun et al. (2012) selected reference streams by considering the water environment (water quality, stream bed substratum, and land use type) of Korean streams prior to developing multimetric indices based on benthic macroinvertebrates. NIER (2015a) selected reference streams by adding stream order to the criteria of Jun et al. (2012) and modifying the reference conditions for some variables in order to select and use reference streams. However, these studies dealt with selection of reference streams for specific research purposes, the environmental variables used were applied on a consistent basis, and the selected reference streams were mainly small-scale streams. Thus, we focused on the biological variable of benthic macroinvertebrates and established the reference condition by using five environmental variables. BOD 5 and turbidity can affect a habitat depending on the sensitivity of organisms to organic and inorganic pollution, respectively (Min and Kong 2020). TP represents the trophic state of a stream; when algae thrive in a stream bed substratum because of a high concentration of TP, the habitat of benthic macroinvertebrates is limited (Dudley et al. 1986;Tonkin et al. 2014). TN is often used with TP as a variable for nutrient concentrations, but in the Republic of Korea, the background concentration of nitrogen is relatively higher than that of TP because of the overall eutrophication of water bodies resulting from intensive land use (Min and Kong 2020). For this reason, we only used TP as a variable representing the trophic state, and we used NH 3 -N, which can be toxic to aquatic organisms (USEPA 2013b), rather than TN. The higher the proportion of fine particles, the more severe the sedimentation; the result can be considered a disturbance in the stream bed since the substratum is a direct and important habitat for benthic macroinvertebrates (Jun et al. 2016;Min and Kong 2020). Coarse particles may become fine because of sedimentation, which may limit the habitat available to organisms if it is covered with fine particles (Xuehua et al. 2009); thus, we used coarse particles as a variable for reference streams. In addition, stream size directly reflects the surrounding environment and determines the amount and form of materials flowing into the stream, which affects the condition of the biological habitat such that aquatic organisms must adapt to it and inhabit it in various ways (Shearer et al. 2015). For this reason, when setting the reference condition, it is necessary to consider different criteria for variables that may affect the stream ecosystem depending on size. Therefore, NH 3 -N, TP, and turbidity were applied as relatively relaxed criteria with increasing stream size, which prevented reference streams from being biased toward small-scale streams. BOD 5 is a consistent criterion with overall lower levels than those of other variables (Min and Kong 2020). For the stream bed substratum, some streams in the Nakdong River watershed naturally and predominantly have substrates smaller than pebbles; thus, the ratio of fine particles is higher than that in other watersheds (Jun et al. 2016). The streams in the Nakdong River, where the ratio of fine particles was high when substrate the size of sand or smaller was considered to be fine, also met the reference condition; thus, a consistent criterion was applied for coarse particles. Land-use type in the watershed environment may also affect the biotic community (Wang et al. 2006), but these variables were not considered because of their redundancy in terms of water quality (Bazinet et al. 2010;Kim et al. 2016). The reference sites selected in our study were given reference criteria depending on the stream type (as opposed to the methods of Jun et al. (2012) and NIER (2015a)), thereby mitigating the portion of the reference sites that were biased towards small streams. However, there was a limitation in that the reference sites were treated fragmentarily for the purpose of this study. Therefore, it is necessary to select objective and multifaceted reference streams.

Predictive variables for establishing predictive models
Predicting the expected fauna of a site by using the environmental properties of a new site requires (1) the probability of belonging to each classification group according to the environmental variables of the new site and (2) the frequency of occurrence of the taxa in each classification group (Wright et al. 2000). That is, predictive variables that compute the probability of belonging to a new site are as important as the method of classifying reference sites into biotic communities. The occurrence and abundance of benthic macroinvertebrates by taxa depend on a wide range of geographic and ecological processes, and regionally depend on the availability of appropriate habitats and food resources. However, accurate information on habitats and food availability can be difficult to obtain because of the wide range of spatial scales in streams (Clarke et al. 2003). Therefore, the goal of the statistical prediction model is to use appropriate environmental variables that can be measured in a standardized way for all stream types. This requires use of variables that can be obtained in an easy and consistent manner, specifically those with strong correlations that can be used to predict life at sites; however, these variables do not need to be causal determinants of benthic macroinvertebrate presence (Clarke et al. 2003). Furthermore, it is not appropriate to use already disturbed predictive variables because of the effects of the disturbance being assessed (Smith et al. 1999). The most ideal method is to use predictive variables that are easily accessible and invariant to disturbance and time. However, water depth, velocity, and stream bed substratum, which create microhabitats, have been known as the best predictive variables of benthic macroinvertebrates. Some chemical variables have been used as good predictive variables according to national or regional characteristics; thus, certain variables that change with time or disturbance can be used as predictive variables Wright et al. 2000;Hargett et al. 2007;Clarke et al. 2011). Given this, we used physical variables in the currently available national data as candidates for predictive variables and categorized each variable by their relation to stream size, geographical features, and microhabitats. The substratum type was excluded as a candidate predictive variable because it was used to select the reference sites.
The relative importance of the predictive variables for classifying groups of reference sites was in the order of catchment area, latitude, velocity, water depth, altitude, and longitude, according to discriminant analysis. Stream order and width were removed from the predictor variables. Although stream size is the best predictive variable for finding benthic macroinvertebrate communities in the Republic of Korea, the communities generally change by geographic features and microhabitats (Jun et al. 2016;Min and Kong 2020). Stream size includes changes in biotic communities in terms of the longitudinal continuity from upstream to downstream (Vannote et al. 1980). Stream order, catchment area, and stream width all represent stream size, but their characteristics are slightly different. In the Republic of Korea, most of the country's land is mountainous, so the streams generally have the shape of mountain streams even at non-mountainous altitudes (Bae et al. 2003;Min and Kong 2020). Even for streams of the same size, stream order may differ depending on the number of inflowing streams. In addition, in large streams like rivers, stream order does not increase further even if the stream size increases (in this study, the 7 th order is the maximum). Stream width can represent stream size according to the stream shape better than stream order can, but it is a measurement of a fragmented width of sampling sites, not a measured width of particular sections. Thus, stream width can be constant or can decrease from up to downstream rather than always increase. Not only does the catchment area accurately reflect the size of the stream, but it also increases as the size increases and is less variable with disturbance or time than is the stream order. Thus, the catchment area has a large influence on changes in biotic communities. The fact that geographical variables such as altitude, latitude, and longitude help predict benthic macroinvertebrate communities suggests the importance of water temperature (Jacobsen et al. 1997;Hargett et al. 2007). Groups 1 to 3 had higher altitudes, latitudes, and longitudes than did Groups 4 to 6 (reflecting high east and low west geographical features). Accordingly, the taxa in Groups 1 to 3 had a higher proportion of cold-water taxa. In particular, Group 1, which had the highest altitude, included most of the Plecoptera in this study. Furthermore, the fact that altitude, latitude, and longitude affect the changes in stream benthic macroinvertebrate communities was consistent with the findings of previous studies (Smith et al. 1999;Sandin 2003;Hargett et al. 2007;Jun et al. 2016;Min and Kong 2020). Water depth and velocity were microhabitat variables that directly predicted benthic macroinvertebrates; we mainly considered them in order to understand how biotic communities changed with the change in environmental variables (Sandin 2003;Li et al. 2012;Min and Kong 2020). At the same time, water depth and velocity tended to change like stream size and altitude changed, respectively, so we selected these variables as highly influential predictive variables. Velocity is one of the main variables affecting biotic communities and has been suggested in several studies Malmqvist and Maki 1994;Jun et al. 2016). If velocity is fast, more organic matter and dissolved oxygen are provided in a short time, and coarse substrates affect the diversity and abundance of benthic macroinvertebrates depending on the velocity (de Oliveira and Nessimian 2010; Sandin 2003). As water depth deepens, the velocity slows, and substrates tend to be finer. Therefore, water depth is a variable that also affects the substratum and is just as important as velocity for predicting biotic communities. In Clarke et al. (2011) and, water depth was also selected along with stream size as an important predictive variable.
Predictive models for the expected benthic macroinvertebrate community When models were derived by combining the identification level, probability of capture for taxa, and site assignment methods, and the NRMSE for the number of taxa and assessment index were compared, we concluded that a model which combines family level, indicator taxa proposed from BMFI, and weighted average was the most reasonable.
The overall NRMSE was lower at the family level than at the species level. Thus, family level was a better fit for the model, but it also could mean that the sensitivity was lower . In some studies, the differentiation provided by family level to classify reference sites or loss of sensitivity was not large when compared to that provided by the species level Marchant et al. 1995;Smith et al. 1999). Other studies differed from the species level in clustering analysis or could not detect the difference between reference and test sites Davy-Bowker et al. 2005). These differences arose from the regional and family characteristics used in the studies. There are many families with different ecological characteristics at the species or genus level (mainly Ephemeroptera, Plecoptera, and Trichoptera), while others (mainly Odonata, Hemiptera, Coleoptera, and Diptera) do not differ. In addition, even if various taxa are included within the family, the results of the assessment of the species and family levels may be similar in areas where only some taxa of the genus and species live. In the United Kingdom and Australia, family-level results are reasonable because the specialization of the taxa characteristics is not clear . On the other hand, there are several taxa that require different ecological properties per family, which occurs in some cases in the United States Hargett et al. 2007). Since there were many sites where several taxa occurred within the same family in the present study, we think the family level was less sensitive than the species level. However, using the species level, the NRMSE in some cases was !0.15 higher than the NRMSE at the family level. Therefore, even though the sensitivity of assessment was relatively low, using the family level with a high model fit was considered appropriate. Site assignment based on the probability of sites belonging to the clustering groups was conducted using the weighted averages used in RIVPACS and AUSRIVAS, where the sites were assigned to the group with the highest probability (attempted by . For the number of taxa, both family and species levels had lower NRMSE weighted averages than when the taxa were assigned to specific groups, and the NRMSE also had a lower weighted average of more than 0.10 at P 75 . On the other hand, in BMI or BMFI, NRMSE was generally lower when assigned to a specific group than with the weighted average, but the maximum difference was 0.06. This maximum difference was negligible, and the NRMSE value was also less than that of the number of taxa. Accordingly, we thought it would be better to lower the NRMSE for the number of taxa than for BMI or BMFI. The environmental variables of streams change continuously and gradually rather than discretely; accordingly, the biological community also changes gradually (Vannote et al. 1980). Based on this concept, the RIVPACS model also uses the weighted-average method for the probability of predicting sites (Clarke et al. 2003). Our results represented this well; representatively, Group 2 could be considered as a section in which Group 1 transitioned to Group 3 as the stream size increased and the altitude decreased in order of Groups 1 to 3. Thus, the organisms with a high frequency of occurrence and individual abundance in Group 2 were also those with the highest frequency of occurrence and individual abundance in Groups 1 and 3, and Group 2's indicator taxa were fewer than Groups 1 and 3. RIVPACS uses the ratio of the number of observed taxa to the expected taxa to assess the degree of disturbance. Thus, in theory, the use of all available information, including rare taxa at a high risk of extinction, yields highly sensitive results for biological assessments (Cao et al. 1998). For this reason, since the fit of the model decreases as the probability of capture decreases (Wright 1994), RIVPACS initially proposed the probability of taxa being captured as P 50 or P 75 (Moss et al. 1987), and recently as P 0 . On the other hand, some studies have shown that a greater number of taxa used does not always relate to more sensitive assessment results, and the model of P 50 was applied based on results showing that assessment at P 50 can lead to more suitable results than can assessment at P 0 (Smith et al. 1999;Hargett et al. 2007). Disturbances can negatively affect some taxa and benefit others, meaning that taxa that appear infrequently at reference sites may appear more often at disturbed sites. That is, taxa within the same taxonomic group may appear to complement the reduced taxa at the test sites. In such cases, an assessment method that considers only the number of taxa and ignores the taxa composition may reduce the sensitivity of the assessment, as long as the disturbance does not reduce the overall number of taxa. Consequently, the increase in the appearance of taxa at test sites that appear less often at reference sites would partially compensate for the effect of the loss of taxa on the lowering of the O/E ratio at P 0 , which would be larger at the family level than at the species level. However, at P 50 , although the taxa appearance is low at the reference sites, the taxa that are expected to increase at the disturbed sites can be excluded from the assessment. This can lead to a relatively more sensitive result for the disturbance.  suggested that these results were remarkable in streams in the oligotrophic state. Our study showed that the more frequent the appearance of taxa at the family level, the lower the NRMSE and the higher the fit; however, the sensitivity decreased, which was similar to results obtained in the United Kingdom (Clarke et al. 2003). In this regard, it seems most appropriate to accept the case of RIVPACS and apply the BMFI indicator taxa list, which contains more taxa information than does P 25 and is more suitable than P 0 . This study attempted to develop a predictive model in the Republic of Korea by using currently available data. However, the variance of the number of taxa at the reference sites was large. Such model errors may have been related to (1) inappropriate selection of reference sites, (2) environmental variables that were not invariable, or (3) incorrect measurement of environmental variables and misidentification of taxa (Clarke et al. 2003), or due to differences between habitats such as the presence of riffles, runs, and pools (Parsons and Norris 1996). Accordingly, when RIVPACS was developed in the United Kingdom, the number of reference sites gradually increased as the criteria for reference conditions were considered according to the situation, and a method was sought to replace variables that had a great influence on the prediction of taxa with constant variables (Clarke et al. 2011). In Australia, sampling was conducted by habitat to reduce errors in habitat differences at the sites (Smith et al. 1999;Wright et al. 2000). The Republic of Korea needs to collect additional data, taking into account all current attempts in the United Kingdom and Australia, and develop more sophisticated predictive models. We think that when data from which an appropriate model can be derived are obtained, developing a predictive model for each watershed and reviewing its applicability in consideration of its inherent characteristics ) will serve as an opportunity to establish a more accurate assessment system in the Republic of Korea.