Hydrogeochemical and multivariate statistical appraisal of pollution sources in the groundwater of the lower Bhavani River basin in Tamil Nadu

ABSTRACT Multivariate statistical techniques have emerged as one of the most effective tools in hydrochemical characterization and the identification of pollution sources in groundwater. Hydrogeochemical data of the 36 wells in the Lower Bhavani River basin in Tamil Nadu are used in this study. Hierarchical cluster analysis(HCA) derived three major clusters, in which cluster 1 has high concentration of Ions (n = 14; avg TDS = 1259 mg/L), followed by cluster 2 (n = 13; avg TDS = 775 mg/L) and cluster 3 (n = 8; avg TDS = 357mg/L). The hydrochemical facies also agree with the cluster hydrochemistry with Na-Cl type (cluster 1), Ca-Mg-Cl (cluster 2), and Ca-HCO3 (cluster 3) showing the influences of anthropogenic and the natural (rock–water interaction related) geochemical patterns. Aqueous Speciation Modeling suggests that the undersaturated Halite (Na-Cl), Gypsum, and Anhydrate minerals in all three clusters, Indicating the possibility for the dissolution of Na, Ca, SO4, and Cl. As in HCA, Principle Component Analysis (PCA) also delivered three major components showing the impact of textile industries and agricultural fertilizers, leakage of sewages and the natural interaction of water with fluorite rich minerals. It is observed that both natural and anthropogenic processes are controlling the variations in the hydrochemical parameters and correlated with land use patterns.


Introduction
Groundwater is the primary and most trusted source of drinking water in many arid and semi-arid regions of the world. The chemical composition of groundwater is an indicator of its suitability as a source of water for human and animal consumption, irrigation, and for industrial and other purposes. The definition of water quality is therefore not objective, but socially defined depending on the desired use of the water (Babiker, Mohamed, & Hiyama, 2007).
Groundwater pollution is common in both the developing and developed world. Contamination of groundwater can result in poor drinking water quality, loss of water supply, high clean-up costs, high costs for alternative water supplies, and/or potential health problems (Balakrishnan, Saleem, & Mallikarjun, 2011). In general, the increase in the ionic constituents in groundwater beyond the permissible level is termed as pollution. There are natural and anthropogenic processes that regulate the chemistry and quality of groundwater (Merchán, Auqué, Acero, Gimeno, & Causapé, 2015;Sajil Kumar, 2014). The most important subsurface processes that regulate groundwater chemistry are precipitation, recharge and discharge, ion exchange, redox processes, precipitation-dissolution processes, residence time, etc. (Reghunath, Murthy, & Raghavan, 2002;Sajil Kumar & James, 2016;Subramani, Rajmohan, & Elango, 2010). Leaching of fertilizers from agricultural fields, industrial effluents, and accidental spillages can also affect groundwater quality (Abu El Ella, Elnaze, & Salman, 2017).
India has been facing severe water scarcity in several parts of the country, especially in arid and semi-arid regions. Overdependence on groundwater to meet the ever-increasing demands of domestic, agriculture, and industry sectors has resulted in the overexploitation of groundwater resources and deteriorated the quality in many states of India (Machiwal, Jha, & Mal, 2011). Several studies were conducted in different parts of the country by a large number of researchers to evaluate the groundwater quality and its suitability for various purposes, spatial distribution, chemical composition, saline intrusions, groundwater-surface water interactions and groundwater vulnerability (Elango, Kannan, & Senthil Kumar, 2003;Subramani, 2005;Srinivasamoorthy et al., 2011;Vasanthavigar et al., 2010;Dar et al., 2011;Kumari, Singh, Verma, & Yaduvanshi, 2014;Gulgundi & Shetty, 2018). Identification of the factors affecting groundwater quality needs to be done using suitable evaluation methods. Multivariate analytical statistical methods were repeatedly used in literature to characterize the hydrogeochemistry (Guler, Thyne, McCray, & Turner, 2002;Kazakis, Mattas, Pavlou, Patrikaki, & Voudouris, 2017;Liu, Lin, & Kuo, 2003;Singh et al., 2017;Singh, Shashtri, & Mukherjee, 2010;Yidana & Yidana, 2010). Geographical Information Systems (GIS) are widely used in all fields of science. It is particularly useful in giving an overall idea about the system, without much physical fieldwork (El-Fadel, Tomaszkiewicz, Adra, Sadek, & Najm, 2014;Pourtaghi & Pourghasemi, 2012;Venkatesan & Senthil, 2018;Venkatramanan et al., 2015). It can be used as a tool in representing the special features and properties with better visuals.
In this study, we have used multivariate statistical analysis in combination with geostatistical and geochemical modeling to assess the source evaluation of geochemical parameters for the Lower Bhavani River basin in Tamil Nadu, India.

Study area
The river Bhavani originates from the western Ghats, flowing through Silent Valley in Kerala, draining western parts of Tami Nadu by covering 217Km and finally ending in Cauvery ( Figure 1). The whole basin covers 0.62 million hectors (www.rainwaterharvesting.org) with 11°15ʹ N and 11°45ʹ N latitudes and 77°00ʹ E and 77°40ʹ E longitudes. This part of the country is semi-arid with an annual average rainfall 618 mm. Monsoons are the biggest contributors of the water to groundwater recharge, with the largest one being the NE monsoon occurring in the months of October and November (Sajil Kumar and James, 2016). Temperature rises up to 40°C in the summer season and sinks down to 13°C in the winter season. The hottest month is May. Potential yearly evapotranspiration (ET) is 1,600 mm in the lower Bhavani Basin. The drainage pattern of the basin in mostly dendritic, which is an indication of the uniform resistance of rocks, a typical characteristic of hard rock terrain. Major rock types in Bhavani basin are fissile hornblende-biotite gneiss, charnockite, granites, hornblende-biotite gneiss. In addition, there is pyroxene granulite, ferruginous quartzite, tremolite schist, amphibolite, gabbro/anorthosite, pink migmatite, dolerite dykes, and granite intrusions (GSI, 1995;Sajil Kumar, 2017). Groundwater occurrence and dynamics are largely controlled by the geology of the region. Important soil formations are red calcareous soil, red non-calcareous soil, black soil, alluvial and colluvial soil, brown soil, and forest soil. However, red calcareous soils are the dominant soils, mostly sandy to loamy. Geomorphology suggests structural hills, Inselberg, ridges, valley fill, and pediments. Shallow pediments are the major geomorphological unit (CGWB, 2008). Land use shows that most of the area is used for agricultural purposes, mostly paddy, banana, groundnut, and sugarcane crops.

Groundwater sampling and analysis
Groundwater samples from 36 wells in the Lower Bhavani River basin were collected in 2011 as part of this study. The wells were located based on availability, spatial distribution, accessibility for further studies, representation of the hydrogeology, etc. All the selected wells were marked with latitude, longitude, and elevation using a handheld GPS (Garmin). Before the samples, wells were pumped out till the insitu parameters were stabilized in order to get representative samples of the subsurface hydrology. Samples were collected in pre-cleaned polythene bottles of 1 L capacity and kept at 4°C until analyzed for chemical parameters. Chemical analysis was performed as per the standard methods suggested by APHA (1998). Potable digital meters were used for the in-situ measurement of pH and EC. Ions such as Ca, Mg, Na, K, CO 3 , HCO 3 , Cl, and SO 4 were analyzed. Carbonate and bicarbonate were assessed by acidic titration, Cl with Arganometric Titration with AgNO 3 . Sulfate was analyzed using an UV visible spectrophotometer. Nitrate and fluoride were assessed using Ion Chromatography. Sodium and potassium were analyzed using a flame photometer. Calcium and magnesium were determined by titration with EDTA. Ion balance error was calculated based on the anion and cation values and observed to be within the standard limit of ± 5%

Hierarchical cluster analysis (HCA)
Cluster analysis is a statistical technique that groups the samples based on the inherent structure and the underlying similarities within the data set. This method does not need any prior assumptions, and the clustering is done only by their nearness or similarity (Vega et al., 1998). In this method, a squared Euclidean distance is used to identify the multivariate resemblances in the hydrochemcial data (Kamble & Vijay, 2011;Ward, 1963). The most commonly used one is the Wards method, which uses the analysis of variance approach (ANOVA) to evaluate the distances between clusters (Singh, Malik, Mohan, & Sinha, 2004). Several studies have used this method as an efficient data analysis method when there a large data set of different characteristics be analysis.

Factor analysis
Factor analysis is a multivariate statistical technique in which the large number of data set is reduced to a small number without losing the information. In this method, correlation matrix is computed for the entire data and identify the interconnection between parameters. The derived parameters will express the characteristics of the complete data set. The eigenvalues and factor loadings for the correlation matrix were determined and scree plot was drawn. The extraction factors were based on the variances and covariances of the variables. The factors with eigenvalues greater than one are considering as the significant variables. Finally, by the process of rotation, the loading of each variable on one of the extracted factors is maximized and the loadings of all the other factors are minimized. This study considered pH, EC, TDS, TH, Na, K Ca, Mg, Cl, HCO 3 , and SO 4 as water quality parameters. The software package SPSS 16 was used for the statistical analysis.

Geochemical and geostatistical modeling
Aqueous speciation modeling was carried out on different groundwater groups (from cluster analysis) using the PHREEQC code in the Aquachem 4.0 (Parkhurst & Appelo, 1999). The saturation indices of minerals were calculated based on the equation SI = log (IAP/Ksp); where SI: saturation Indices; IAP: Ion activity product; Kt: solubility product. The spatial variation maps where created using the Geostatistical mapping tool in the ArcGIS 10.1. We have used the inverse distance weighting method.

Chemical characteristics of groundwater
Groundwater samples were analyzed to study the water quality variables, and the results are given in Table 1. Groundwater was generally alkaline in nature and ranged from 8 to 9. Electrical conductivity (EC) showed a wide range of 260-3100 µS/cm. TDS was calculated from EC and it was in the range of 125-1786 mg/L. Twelve groundwater samples exceeded the permissible drinking water limit of 1000 mg/L (World Health Organization [WHO], 2011). Calcium and magnesium concentration was in the rage of 16-168 mg/L, avg = 52 mg/L and 9-117 mg/L, avg = 58 mg/L, respectively. Eight samples exceeded the permissible the 75 mg/L (WHO, 2011). Na concentration ranged from 10 to 460 mg/L, with an average of 140 mg/L. Four groundwater samples exceeded the permissible limit 200 mg/L (WHO, 2011). Concentration of K+ in the study area is 4-375 mg/L, with an average of 42 mg/L. Cations in the study area were dominated in the order of Na>Mg>Ca>K. In the major anions, Chloride dominated the others and was in the range of 14-752 mg/L, with an average of 190 mg/L. A total of 12 samples exceeded the permissible limit of 250 mg/L (WHO, 2011). Bicarbonate concentration ranged between 52 and 526 mg/L (avg = 263 mg/L). Three groundwater samples showed an excess concentration of NO 3 (>50 mg/L) and F (1.5 mg/L), which were in the range of 2-60 mg/L and 0.18-1.56 mg/L, respectively. The order of the dominance of anions was HCO 3 >Cl>SO 4 >NO 3 >F.

Hierarchical cluster analysis(HCA) of hydrochemical data
Groundwater quality data from the Lower Bhavani basin was initially subjected to hierarchical cluster analysis for better understanding the geochemical characteristics. HCA was performed on 12 hydrochemical variables using the Ward's method and Euclidean distance as a measure of dissimilarity in the region (Helstrup, Jørgensen, & Banoeng-Yakubo, 2007;Teng et al., 2018). Results presented in a dendrogram ( Figure 2) show that the groundwater can be classified into three distinct clusters, namely Cluster 1, 2, and 3. A statistical summary of the physicochemical parameters of the clusters is presented in Table 2.
Cluster 1 includes samples 4, 6, 7, 10, 12, 13, 15, 18, 24, 27, 32, 30, 33, and 36 (n = 14). This group has the highest dissolved solids avg = 1259 mg/L exceeding the permissible limit for drinking water of 1000 mg/L (WHO, 2011). As TDS, other major ions such as Na, K, Ca, Mg, Cl, SO 4 HCO 3 , and NO 3 showed their highest concentration in Cluster 1 compared to the other clusters. In the piper trilinear diagram See Figure 3) most of the samples from this cluster are plotted in Na-Cl type of water. In Cluster 2, samples 9, 11, 14, 16, 17, 20, 21, 22, 23, 29, 31, 34, and 35 (n = 13) were grouped and the average TDS value was 775 mg/L. Most of the parameters fall within the limit of drinking water guideline values. The representing water type of this group is Ca-Mg-Cl. Cluster 3 samples were mostly unaffected by any kind of contamination (avg TDS = 357 mg/L). The samples grouped in this cluster are 1, 2, 3, 5, 8, 19, 26, and 28 (n = 8). These samples were represented by the hydrochemical facies CaHCO 3 , which also indicate the freshness of these water samples.
Spatial distribution of groundwater clusters in the study area was plotted and compared with the land use map (Figure 4). The modification of the land for different purposes are often changing the natural conditions of soil, water, and environment. The chemical quality of water is showing an improvement from cluster 1 to 3. It must be noted that majority of the Cluster 1 samples were plotted in agricultural areas and in the industrial areas with more human settlements. This is the reason for the higher concentration of Na, Cl, NO 3 , SO 4 in the groundwater and reported    their highest concentrations in this area (see Table 1). This shows that the groundwater chemistry of this group may be controlled by irrigation return flow, anthropogenic activities like industrial pollution, and domestic and urban sewages. Cluster 2 samples were showing the transition stage from recharge to discharge areas. The average TDS value in this cluster is 775 mg/L, obviously less than the permissible limit 1000 mg/L (WHO, 2011). Additionally, the Ca-Mg-Cl type of water also supports this argument. Cluster 3 samples were scattered all over the study region, mostly represents the water which is unaffected by the anthropogenic activities.

Geochemical modeling of groundwater clusters
Saturation indices (SI) values were calculated for the three different clusters and are presented in Table 2. When the SI is below zero, the water is undersaturated with respect to the mineral under consideration. The SI value zero means that the water is in equilibrium with the mineral, whereas an SI greater than zero means a supersaturated solution with respect to the mineral in consideration.
Results show that in Cluster 1, groundwater is undersaturated with carbonate minerals in combination with SO 4 , i.e., anhydrite (SI = −1.89) and gypsum (SI = −1.65). This means that groundwater can dissolve more solutes from these minerals and thus can increase the concentration of the same. Additionally, high SO 4 input from textile waste can increase its concentration in groundwater and the available Ca in groundwater may not be enough to meet the equilibrium. Thus, the resultant deficiency is indicated as negative SI values. However, in terms of dolomite (SI = 1.69), aragonite (SI = 0.54), and calcite (SI = 0.69), the groundwater is saturated and further dissolution may result in the precipitation of these minerals.
As in Cluster 1, samples in Cluster 2 and 3 also show a similar trend but anhydrite and gypsum become more undersaturated. Whereas dolomite, aragonite, and calcite reduce its saturation level by reducing the SI values, remaining saturated except aragonite in Cluster 3, which was drastically changed to undersaturation (see Table 3). Halite was also undersaturated in all three clusters, becoming more and more undersaturated from Cluster 1 to 3, i.e., SI Halite ranged from −5.8 to −7.38. This  may be due to the additional input of Na from mineral weathering other than the NaCl sourced from textile wastes. The other possible explanation will be the reduction of Na content in groundwater compared to Cl, due to the ion exchange reactions. The conservative nature of Cl in groundwater is also supporting this argument.

Source appointment of the ions controlling the hydrogeochemistry
Geochemical mechanisms occurring in the subsurface are invisible and complex. Thus, understanding these processes is possible only with certain techniques, assumptions, and inferences from already proven results. Factor analysis is one of the most important statistical data analysis techniques, which allow us to differentiate the various processes that control the solute concentrations in groundwater. In the first step, a correlation coefficient is calculated for the 11 parameters and presented in Table 4. Electrical conductivity showed a strong positive correlation with Mg (r 2 = 0.77), Na (r 2 = 0.84), Cl (r 2 = 0.91), SO 4 (r 2 = 0.63), HCO 3 (r 2 = 0.69), and NO 3 (r 2 = 0.71). This shows that the groundwater chemistry of the study is primarily controlled by the processes related to these ions. Mg showed a positive correlation with Ca (r 2 = 0.54) as well, and thus indicates a common origin of these ions in many samples. Cl has a positive correlation with Na (r 2 = 0.74), showing the effect of salinity (Na-Cl) originating from the untreated effluents from the textile industry. SO 4 has a positive correlation with K (r 2 = 0.51) and Cl (r 2 = 0.56) indicating anthropogenic influence. HCO 3 has a positive correlation with Mg (r 2 = 0.58) and Na (r 2 = 0.60), indicating the effect of silicate weathering on groundwater chemistry. Nitrate is always considered as an indicator of anthropogenic pollution, having a positive correlation with Mg, Na, and Cl (Ahamad, Madhav, Singh, Pandey, & Khan, 2018;Zahn & Grimm, 1993). Fluoride does not show much correlation with the other parameters. The results of the correlation analysis suggest both natural and anthropogenic influences on hydrochemistry. The significance of these processes on groundwater chemistry can be differentiated by the factor analysis in the below section.
Principle component analysis was carried out on the entire data set and factor loadings were generated (Table 5). Based on the eigenvalues (>1), three factors evolved and explained 80% of the total variance in the data set.
Factor 1 is responsible for more than 46% of the variance, with very high factor loadings for EC, TDS, Ca, Mg, Na, Cl, HCO 3 , and NO 3 . It is well known that the Bhavani basin is famous for textile industries. The processing of textiles requires enormous amounts of salts and other coloring agents. After the treatment, the effluents rich in common salts (NaCl), MgCl 2 .6H 2 O, FeSO 4 .7H 2 O are discharged. It was found that there are no proper treatment plants for many companies in the study area; thus, these contaminants can reach the surface and the groundwater. Very high factor loadings for the ions such as Na, Mg, Cl, and also TDS indicate that Factor 1 represents the anthropogenic contaminants. In addition to these Ions, NO 3 was also found in Factor 1. A natural origin (lithogenic and atmospheric deposition) for Nitrate is not considered as a major source in groundwater. Thus, considering the land use, the majority of the study area is used for agricultural activities and the high use of fertilizers will be the primary source of nitrate in the groundwater. Moreover, very high values of Cl in Cluster 1 suggest other pollution sources such as industrial effluents and fertilizers from agricultural fields.
Ionic cross plots were plotted drawn to confirm the results. Figure 5(a,b) shows the relation of EC with Na and Cl. These ions vary positively with increasing conductivity values. The origin of Na and Cl will be a common origin if the concentration of one is varying proportionally to the other. Figure 6 shows that in most of the samples Na and Cl varying proportionally, suggesting the origin from high saline textile wastewater. Additionally, the anthropogenic influences were further proved by the plot between Cl and NO 3 (Figure 6). Both these ions are not found in the groundwater naturally and their higher concentration indicating the manmade origins. In this study Cl concentration varying positively with NO 3. These arguments support the anthropogenic origins of these ions explained in the earlier sections (Figure 7). Factor 2 is responsible for 21.23% of the total variance in the data set. Very high factor loadings were observed for pH, K, SO 4 and weak but positive loadings for TDS. High concentration of potassium and sulfate can be from the weathering of minerals such as potassium feldspars and calcium sulfate minerals such as Gypsum and Anhydrate (Subramani et al., 2010). However, the natural origins of these ions cannot exceed certain limits. Very high concentration of these ions shows anthropogenic activities like refuse dumping. A positive correlation of potassium with sulfate and a weak correlation with HCO 3 agree with the observation of anthropogenic influences like sewage leakages into the groundwater. In addition, if fertilizers were the major reason for high potassium content, a much stronger correlation than r2 = 0.35 would be present with Nitrate.
Factor 3 has a major factor loading F¯which are responsible for 12.45% of the total variance in the data. Fluoride is geogenic in origin by the weathering of carbonate rocks such as carbonate and aragonite and fluoride-rich minerals such as fluorite, apatite, amphiboles, and micas. The concentration of F¯in groundwater is increased with alkaline pH, increasing concentration of Na, and HCO 3 . and reduce with the increasing Ca2+ concentration. This negative relation of Ca2+ with F¯is due to the precipitation of calcium as CaCO 3 and the mobilization of fluoride ion from the fluorite minerals will take place under alkaline conditions. Most of the F3 values were in the NW regions where almost no anthropogenic activities are practiced. Thus, F3 is a clear representation of the natural groundwater. These ionic relationships were plotted in bivariate plots to demonstrate the results. The relation between fluoride with pH ( Figure 8) and  with HCO 3 (Figure 9) shows that these parameters providing a positive influence on the fluoride concentration in the groundwater. Majority of the samples were in the positive trend for these ions. The factor scores of individual water samples were calculated and plotted as a spatial variation map for all the three factors (see Figure 10). Spatial mapping is a very useful tool in understanding and visualizing the data and moreover providing options to compare with the other factors like land use, geology, etc., Map of F1 showing the occurrence of higher factor scores in the agricultural and Industrial areas is evident from its comparison with land use map. The higher values¯are observed in the NW, Eastern and Central part of the study area. Again, it is also supporting the results of cluster analysis. In this way, the F2 score map shows that the higher loading showed by the Northern, SW, and edges of SW boundary of the study area. These regions are mainly forest lands and human settlements. F3 scores are mostly higher in the NW regions and found in patches all over the study area. The important factor is F¯, which is a completely natural component in the groundwater. Dissolution of the fluoride minerals under favorable conditions increase the mobility of this ion. Overall, the cross caparison of the different maps provides refined results.

Conclusions
Hydrogeochemical investigations in the Lower Bhavani River basin were conducted and the results analyzed using geochemical modeling and statistical techniques. The whole data set was grouped into three different clusters using Hierarchical Cluster Analysis (HCA) and these groups were further analyzed geochemically. The order of dominance of the major cations and anions were Na>Mg>Ca>K and HCO 3 >Cl>SO 4 >NO 3 >F, respectively. A Piper diagram revealed three different hydrochemical facies ranging from CaHCO 3 (fresh) to Ca-Mg-Cl (mixed) and then finally to NaCl (saline) types. Geochemical modeling of the clusters showed that undersaturation of halite, anhydrate, and gypsum in the groundwater. This shows the possible increase in the concentration of ions like Na, Ca and Cl, SO 4 . For the source Identification of the contaminants, principal component analysis (PCA) has been performed on the entire data set. Three major components were derived, and which explained more than 80% of the variation in the entire data set. Factor 1 was dominating with high positive factor loadings for TDS, Mg, Na, Cl, HCO 3 , and NO 3. This suggested the influence of high saline textile effluents (Na,Cl), natural geogenic sources (Mg, HCO 3 ) and the presence of NO 3 in this group indicating agricultural sources. Factor 1 is showing the mixed influence of natural and anthropogenic sources. Factor 2 was dominated with SO 4 and K, these factors were affected by the leakage of sewage. In Factor 3, higher factor loadings were found for Ca and F. Both these ions represent the geogenic contamination and fluoride dynamics in groundwater are largely controlled by the co-occurrence of Calcium. The comparative analysis of the spatial variation maps of each factor with the land use map gives supportive evidence for the origin of contamination from the sources suggested by principal components. Overall groundwater quality has been influenced by natural and anthropogenic sources. For future management of the groundwater ions listed in Cluster 1 need more attention.