Estimation of cyanobacteria pigments in the main rivers of South Korea using spatial attention convolutional neural network with hyperspectral imagery

ABSTRACT Although remote sensing techniques have been used to monitor toxic cyanobacteria with hyperspectral data in inland water, it is difficult to optimize conventional bio-optical algorithms for individual water bodies because of the complex optical properties of various water components. Therefore, this study adopted a spatial attention convolutional neural network (spatial attention CNN) to estimate the chlorophyll-a (Chl-a) and phycocyanin (PC) concentrations in the Geum, Nakdong, and Yeongsan rivers in South Korea in order to evaluate cyanobacteria using remote sensing reflectance data. The CNN model utilized a spatial attention module to analyze the importance of the bands in the reflectance data. Then, the spatial attention CNN model was compared with different bio-optical algorithms for each study area. The spatial attention CNN model was generalized to estimate the pigment concentrations in the target rivers, and the model performance was evaluated by correlation coefficient (R) and root mean squared error (RMSE) between the observed and estimated concentrations of the algal pigments. The spatial attention CNN model, which was generalized to estimate the pigment concentrations in the target rivers, had R values above 0.87 and 0.88 for Chl-a and PC, respectively. However, the optimized band ratio algorithms for Chl-a and PC had R values above 0.83 and 0.70, respectively. Hence, it showed better performance than the conventional bio-optical algorithms. The spatial attention module provided attention weights for visualizing important features in the reflectance data. Specifically, the 600 nm, 650 nm, and near-infrared regions had high attention weights for estimating the concentrations of Chl-a and PC. Based on these findings, this study demonstrated that the spatial attention CNN model has a high potential for good application performance in various water bodies.


Introduction
Harmful algal blooms (HABs) caused by toxic cyanobacteria are a critical global issue as they impact aquatic organisms as well as human health Ali et al. 2016). In addition, they result in water quality problems related to water supply for domestic and agricultural uses (Carmichael and Boyer 2016). Extreme outbreaks of HABs have been caused by global warming via changes in water temperature and precipitation (Jiang et al. 2021;Gobler 2020;Manning and Nobles 2017). The increase in water temperature is a major factor affecting the growth rate and distribution of algal blooms in inland waters (Ralston et al. 2014;Wang et al. 2021). Furthermore, high frequency of precipitation supplies abundant nutrient flow to the water bodies, thereby contributing to intense outbreaks of toxic cyanobacterial blooms (Gerten and Adrian 2002;Moore et al. 2009). To prevent the degradation of freshwater resources by global warming, the rapid identification and quantification of spatiotemporal distributions for HABs is important (Jupp, Kirk, and Harris 1994;Tomlinson et al. 2016). However, traditional in-situ water quality monitoring and analysis have certain limitations, such as insufficient labor and analysis time, when considering various water bodies (Jin et al. 2017). In addition, it is difficult to obtain the spatial distributions of water quality because the in-situ sampling is only implemented at the specific sampling points Thamaga and Dube 2019).
Remote sensing using aircraft and satellite has been utilized to acquire the spatial and temporal features of HABs with near real-time monitoring data in expansive water body areas (Bosse et al. 2019;Paul et al. 2015). Numerous spaceborne and airborne remote sensing methods obtain reflectance, which can identify the spatiotemporal distribution of cyanobacteria outbreaks. To estimate cyanobacterial concentrations, chlorophyll-a (Chl-a) and phycocyanin (PC) have also been assessed using remote sensing reflectance (Dev et al. 2022;Shi et al. 2019). Chl-a is widely used to monitor phytoplankton biomass, whereas PC represents a specific biomass indicator for cyanobacteria Pei et al. 2015). Chl-a and PC concentrations can be estimated using bio-optical algorithms that utilize optical properties, including remote sensing reflectance and absorption coefficient (Simis, Peters, and Gons 2005;Matthews, Bernard, and Winter 2010). Duan, Ronghua, and Chuanmin (2012) employed satellite remote sensing data to evaluate the performance of semiempirical and analytical algorithms using the absorption coefficient. D'Sa (2014) applied the multi-satellite data to indicate the variability of chlorophyll and Mishra, Schaeffer, and Keith (2014) utilized the hyperspectral data with normalized difference chlorophyll index for estimating the concentration of Chl-a. Kwon et al. (2020) applied a drone-borne image to estimate the vertical distribution and cumulative concentration of Chl-a and PC using reflectance band ratio algorithms. Ammenberg et al. (2002) examined the airborne hyperspectral data using a semi-analytical algorithm with an absorption coefficient. However, conventional bio-optical algorithms that use specific bands exhibit differences in performance along various water bodies owing to the different optical characteristics of each water body (Mobley 1995). Thus, previous studies for estimating Chl-a and PC had to optimize the bio-optical algorithm in addition to the optical properties (Woźniak et al. 2016). Therefore, it is necessary to identify a general algorithm that can be used for different water bodies.
Deep learning models have been utilized to extract and simulate the features of big data with remarkable accuracy (Ahishali et al. 2021). Moreover, these models have also been adopted to detect or quantify HAB outbreaks caused by various environmental factors (Yadav et al. 2020;Rostam et al. 2021). Peterson, Sagan, and Sloan (2020) utilized the progressively decreasing deep neural network with multi-spectral bands data to estimate the inland water qualities including the blue-green algae, Chl-a and etc. However, the neural networks are difficult to consider and to extract the spatial features of input data. Convolutional neural networks (CNNs), which constitute a deep learning technique, have been used for classification and regression and have shown remarkable performance via multidimensional imagery (Gatys, Ecker, and Bethge 2016;Liu et al. 2018). Specifically, the kernel matrix in a CNN model can extract meaningful features from multidimensional inputs (Sothe et al. 2020). Fangling et al. (2019) performed water quality classification using a onedimensional (1D) CNN model with Landsat8 multispectral data and achieved remarkable water quality assessment performance. Riese and Keller (2019) utilized hyperspectral data to apply a 1D CNN model to classify the soil texture and compare its performance with other models. In the previous study, a CNN model with airborne hyperspectral images was applied to estimate the spatial distribution maps of cyanobacteria at the water surface (Pyo et al. 2019). However, deep learning techniques are based on the black-box model, making it difficult to explain the model results, although CNNs possess advantages in processing image data (Lee et al. 2021;Koh and Liang 2017). To solve this limitation, various explainable techniques based on deep learning models have been developed. In particular, the attention module visualizes features and interests of input images used to train deep learning models. This could improve the inexplicable training process of the black box model (Kelvin et al. 2015). Chen et al. (2017) proposed spatial, channel-wise, and multi-layer attention mechanisms in cooperation with long short-term memory models. Woo et al. (2018) focused on the visualization of intermediate features of input images using both spatial and channel attention modules in a CNN model. Spatial attention can be used to visualize a feature map along the wavelength of the 1D hyperspectral data. However, while CNN research has been employed for the classification using hyperspectral data (Ahishali et al. 2021;Fauvel et al. 2012;Mei et al. 2019), few studies have applied CNN regression models with spatial attention modules to estimating the concentration of cyanobacteria in various water bodies. The development of the generalized model might expand the possibility to estimate the cyanobacteria of various water bodies in different study areas.
Therefore, this study aimed to develop a general deep learning model that can estimate the concentrations of Chl-a and PC in representative rivers in South Korea. The objectives of the study were to: 1) construct a bio-optical algorithm and spatial attention CNN model using observed Chl-a, PC, and hyperspectral imagery; 2) compare the performance of the biooptical algorithms with that of the spatial attention CNN model in the main rivers of South Korea, and 3) visualize the attention weights in order to analyze the importance of wavelength.

Study area
This study selected the Geum (GE), Nakdong (ND), and Yeongsan (YS) rivers -the major rivers of South Korea -as study sites. These rivers are major water supply sources and the water from these rivers is utilized in agricultural, industrial, and domestic water processes in widespread areas (Kim et al. 2017;An et al. 2019). The geographical information of the study area is presented in Table 1. The GE River is located in mid-western South Korea Figure 1(a). The length of main river is 398 km, and its water surface area is 9,915 km 2 . The Daecheong (DC) reservoir and Baekje (BJ) weir have been studied as critical cyanobacterial bloom regions in the GE River. In the DC reservoir (36.35°-36.52°N, 127.48°-127.60°E), the watershed area and storage volume are 4,134 km 2 and 1,490 × 10 6 m 3 , respectively. It has a water surface of 72.8 km 2 , a length of 86 km, and an average water depth of 20 m. The water resource is supplied to Daejeon metropolitan city by the water intake stations, including Chudong intake tower (Xin-Chao et al. 2015). The BJ weir (36.24° -36.42°N, 126.88° -127.04°E) region is located downstream of the DC and has a watershed area of 9,912 km 2 and a stream length of 23 km (Pyo et al. 2019). The BJ movable weir mainly provides the agricultural water and electric supply (Kim, Lee, and Kwang-Guk 2019). The ND River is located in southeast Korea and is the longest river in South Korea Figure 1(b). The water surface area of the ND is 23,817 km 2 , with a length of 525 km (Park and Seok Lee 2002). The monitoring sites in this river were focused on the Changnyeong Haman weir section (36.37°36.40°N, 128.45° -128.54° E) that supplies the domestic, agricultural, and industrial water resources to Busan metropolitan, Gimhae, and Yangsan cities . The YS River is located in southwestern South Korea Figure 1(b). The drainage area and discharge volume of the YS are 3,371 km 2 and 1.5 × 10 8 m 3 , respectively. The monitoring area is the Yeongsan dike (34.76° -34.82°N, 126.45° -126.55°E), which is located in the YS River estuary, and it supplies the agricultural and industrial water resources. In addition, the discharge from the YS River affects to the coastal ecosystems as a huge impact. The crucial cyanobacterial bloom outbreaks occurred in the study area because of abundant nutrient loading and a long water retention time (Shim, Yong Yoon, and Hyung Lee 2015;Kim, Lee, and Kwang-Guk 2019;Gwak and Kim 2016;Cho and Cho 2017).

Data acquisition
The overall research scheme was divided into five sections Figure 2: (A) data acquisition from water sample field monitoring, field reflectance measurements, and airborne monitoring for hyperspectral images; (B) experimental analysis for extracting Chla and PC pigments; (C) hyperspectral image correction preprocessing (radiometric, atmospheric, and geometric correction) via a conversion from raw to reflectance data; (D) construction and training of   the spatial attention CNN model; and (E) building biooptical models, which include semi-empirical and semi-analytical algorithms.

Field monitoring and water sampling with experimental analysis
A FieldSpec HandHeld 2 spectroradiometer (ASD Inc., USA) was used to measure the irradiance and radiance from sky and water surface with a viewing angle of 45° from 9 am to 11 am. A range of wavelengths from 350 to 800 nm was selected to consider the data noise of the measurement equipment. Reflectance data were calculated using the method introduced by Mobley (1999), as given below: where the R rs represents the remote sensing reflectance data [sr −1 ], L w is the radiance from the water [W•m −2 •sr −1 •nm −1 ], L sky is the radiance from the sky [W•m −2 •sr −1 •nm −1 ], 0.025 is the reflectance factor (calculated using the wind speed, solar zenith angle, and viewing angle), and E d is the downwelling irradiance from the sun [W•m −2 •nm −1 ]. The water samples were collected in two sterilized bottles of different sizes to analyze Chl-a and PC concentrations. In the Chl-a extraction process, 250 mL samples were filtered using glass microfiber filters of 0.7 μm pore size (Whatman Inc., USA). The filters were soaked in an acetone and methanol (9:1) solution for 24 h based on solvent extraction (APHA, American Water Works Association, Water Pollution Control Federation, and Water Environment Federation 1912). The PC water samples were concentrated using a 125 mL plankton net and were homogenized using an ultrasonicator (Sonictopia Inc., South Korea). Then, 20 mL of the homogenized samples were extracted and centrifuged at 4000 rpm for 15 min. Next, the liquid was removed, and the remaining solid sample was mixed with 5 mL of phosphate buffer solution (Sigma-Aldrich, USA). Furthermore, PC extraction was performed using the freezing and thawing method (Bennett and Bogorad 1973). After extracting Chl-a and PC, the absorption spectra from 350 nm to 800 nm were measured using a Cary-5000 UV-Vis-Nir (Ultraviolet-visible-near-infrared) spectrophotometer (Agilent Inc., USA). The absorption spectra data were used to estimate the concentrations of Chl-a and PC, following the method reported by Pyo et al. (2016).

Drone-and air-borne hyperspectral image
Nano-Hyperspec (Headwall Photonics Inc., USA) and AISA Eagle (SPECIM Inc., Oulu, Finland) hyperspectral imaging sensors obtained the optical signal data, and field monitoring was conducted in the DC reservoir and BJ weir, respectively. The drone used was a MATRICE M600 Pro hexacopter (DJI Inc., Shenzhen, China), and an aircraft with the AISA Eagle sensor was operated by ASIA Aero Survey Co., Ltd. The measurement of the hyperspectral image was performed when the wind speed was less 6.0 m/s in order to stabilize the aircraft operation. The drone and aircraft were operated at a flight height of 150 m and 3,000 m, respectively. The drone-borne hyperspectral imaging sensor measured a spectral range of 350-800 nm with a spatial resolution of 20 cm pixel −1 and a spectral resolution of 2.2 nm. The airborne hyperspectral images had a wavelength range of 400 to 970 nm with a spatial resolution of 2 m pixel −1 and a spectral resolution of 4.2 nm. The drone operating could utilize to obtain the reflectance data in the relatively small area such as branches of main river. On the other hand, the air-borne hyperspectral images were suitable to measure the wide area. The calibration and data processing procedures of the drone-borne hyperspectral imagery were implemented using SpectralView (Headwall Photonics Inc., US) based on Kwon et al. (2020). Meanwhile, the airborne hyperspectral images were subjected to image processing using MODTRAN 6 software (Pyo et al. 2018). To minimize the signal noise of the calibrated images, a Savitzky-Golay filter was utilized with the MATLAB (MathWorks Inc., USA) image-processing toolbox (Chen et al. 2004). The Savitzky-Golay filter was a second-order polynomial with the frame lengths of 5. The deep learning model utilized the pretreated hyperspectral images to visualize the distribution maps of algal pigments. The model swept the image as one pixel by one pixel to estimate the concentrations of Chl-a and PC at each pixel. The estimated output data were reconstructed with the same shape as the input hyperspectral images.

Bio-optical algorithm approach
Semi-empirical and semi-analytical algorithms were adopted to estimate the pigment concentration using apparent optical properties (AOPs) and inherent optical properties (IOPs) (Mishra, Ogashawara, and Abraham Gitelson 2017;Mishra, Schaeffer, and Keith 2014).

Semi-empirical algorithms
The representative semi-empirical algorithm utilized was the band ratio algorithm, which uses two or three bands. This study utilized two-and three-band ratio algorithms to estimate the Chl-a and PC concentrations, respectively. The two band ratio algorithms are given below (Moses et al. 2009;Mishra 2012): where, [Chl-a] 2B is the concentration of Chla calculated by the two band ratio algorithm, [PC] 2B is the PC concentration calculated by the two band ratio algorithm, and R rs (600), R rs (665), and R rs (708) are the remote sensing reflectance [sr −1 ] at 600 nm, 665 nm, and 708 nm, respectively. The three-band ratio algorithms are as follows (Hunter et al. 2008;Moses et al. 2009): To optimize the band ratio algorithms for each water body, optimal bands were selected by comparing the accuracy with the observed pigment concentration. The bands were empirically tuned to ensure best performance of the optical band ratio algorithm using MATLAB. The equations of the optimized twoand three-band ratio algorithms are as follows: where, [Chl-a] Optim-2B is the concentration of Chla calculated by the optimized two-band ratio algorithm, [PC] Optim-2B is the PC concentration calculated by the optimized two-band ratio algorithm, [Chl-a] Optim-3B is the concentration of Chla calculated by the optimized three-band ratio algorithm, [PC] Optim-3B is the PC concentration calculated by the optimized three-band ratio algorithm, and λ 1 , λ 2 , and λ 3 are the optimized wavelengths. Note that the optimized band ratio algorithms adopted different bands for each water body. The optimized band ratio algorithms were utilized to evaluate the performance of a spatial attention CNN model that could be generalized for different water bodies. Three different optimized band ratio algorithms were constructed for each study area, whereas a spatial attention CNN model was trained on the overall datasets of the three study areas.

Semi-analytical algorithms
AOPs (e.g. radiance, irradiance, and remote sensing reflectance) directly measure the optical properties of a medium. Conversely, IOPs (e.g. absorption and backscattering) consider the inherent properties of the water body and are independent of ambient light. Thus, semi-analytical algorithms have been utilized for IOPs, whereas semi-empirical algorithms are used for AOPs. Gons, Rijkeboer, and Ruddick (2005) and Duan, Ronghua, and Chuanmin (2012) presented the following equations to calculate the concentration of Chl-a: where p is 1.062 [-], a chla λ ð Þ is the absorption coefficient of Chl-a at a specific wavelength, a w is the absorption coefficient dominated by water, b b is the backscattering coefficient, and a � chla 665 ð Þ is the specific absorption coefficient of Chl-a at 665 nm, which is defined as 0.0161 m 2 /mg. Note that in the Eq. (10), a w 709 ð Þ and a w 665 ð Þ are 0.70 m −1 and 0.40 m −1 , respectively.

Input and output composition
The input dataset was selected as the measured field reflectance data in the wavelength range of 450 to 800 nm to minimize data noise. However, the droneborne and airborne hyperspectral images had different wavelength intervals and ranges, as they had 90 and 86 bands, respectively. Thus, we related the wavelength bands between the drone-borne and airborne images with 75 bands. The concentrations of Chl-a and PC were normalized using minmax normalization and were then used as output datasets. The total sizes of the input and output datasets were N × 1 × 75 and N × 2, respectively, where N is the number of samples. The datasets were randomly separated into training datasets (70%) and validation datasets (30%).

CNN
In this study, a CNN was applied to derive the features of the remote sensing reflectance data in order to estimate the concentrations of Chl-a and PC. The convolutional layer extracts various features of the data using numerous convolution kernels (also called filters) (LeCun et al. 1989). The kernels obtain the feature data according to the weight and biases, and then move along the input data. The size of the feature data is the same as the size of the input data using padding and stride (Dumoulin and Visin 2016). In this study, a stride of 1 and a padding of (k-1)/2, where k is the kernel size, which has an odd number, was used. The feature data of the convolutional layer are connected by a batch normalization layer for regularization, which can prevent overfitting (Luo et al. 2018). The batch normalization layer implements the normalization of the convolution feature data and is applied to reduce the internal covariate shift (Ioffe and Szegedy 2015). The pooling layer has the advantage of decreasing the size of the feature data and the number of parameters (Aloysius and Geetha 2017). In addition, the speed of the model training was significantly improved by the pooling layer. Moreover, the general pooling methods are max pooling and average pooling (Gholamalinezhad and Khosravi 2020), in which the max pooling method leaves and highlights the maximum values in the pooling regions, and the average pooling method calculates and maintains average values in the pooling regions. The dropout layer is utilized to prevent overfitting by deactivating random nodes of the feature data or FC layer (Srivastava et al. 2014). The deactivation nodes lead to the selection of various combinations of feature data and train the features of nonlinearity (Choe and Shim 2019).
In this study, the 1D CNN model employed three convolutional layers, three batch normalization layers, and a max-pooling layer. The pooling layer and dropout layers were located in front of the two FC layers. Before starting the model training process, a random search was conducted for hyperparameter optimization (Bergstra and Bengio 2012). Through this optimization process, the 1D CNN model determined the number of epochs, learning rate, mini-batch size, kernel size of the convolutional layer, kernel size of the max pooling layer, number of nodes in the FC layer, and type of activation function. The results of the hyperparameter optimization are summarized in Table 2.

Spatial attention network
The CNN model has difficulty accessing the weights of the output of each input variable because it is a blackbox model (Koh and Liang 2017). Therefore, CNN models can be implemented using explainable techniques, such as class activation maps (Zhou et al. 2016) or attention networks ). Woo et al. (2018) introduced a convolutional block attention module (CBAM) consisting of a channel attention module and a spatial attention module to explore the model training results using 2-dimensional imagery. The channel attention and spatial attention modules focused on analyzing the channel and spatial relationships, respectively. In this study, the input data covered a 1D structure without channels, allowing the 1D CNN model to be combined with the spatial attention module of the CBAM. The spatial attention module provided informative sections of the input data. First, the average and max pooling methods were applied to the input data to obtain two different results. These results were concatenated to emphasize important features. Then, the convolution layer and sigmoid function were applied to the concatenated result. In the convolutional layer, the kernel size was set to seven. The formulation of the spatial attention module was calculated using where M s is the spatial attention module, F is the feature data, σ is the sigmoid function, C 7 is a convolution layer with a kernel size of 7, AvgPool is the average pooling method, and MaxPool is the max pooling method. The result of the spatial attention module was multiplied with the feature data of the CNN model and added to the residual data Figure 3.

Training and validation step
The spatial attention CNN model uses the hyperparameters selected during hyperparameter optimization via numerous model training processes. The model was trained to minimize the loss value, which was calculated as the mean squared error (MSE) between the observed and estimated data, which can be expressed as follows:

Observation analysis of cyanobacteria
The observed pigment information of cyanobacteria in the different study sites is shown in Figure 4. Overall, the concentration of pigments in the GE River was higher than those in the ND and YS rivers.
In the GE River, the highest concentrations of Chl-a and PC were 355.56 mg/m 3 and 537.34 mg/ m 3 , respectively. The standard deviations of the GE River were relatively higher than those of the other rivers. Shin, Kang, and Hwang (2016) reported that Chl-a concentrations above 100 mg/m 3 have a high probability of health damage caused by cyanobacteria outbreaks in South Korea. The field R rs spectra for each study area are shown in Figure 5. The troughs of the R rs spectra near 620 nm and 670 nm may be due to substantial absorption of water constituents, such as algal pigments. The 670 nm (near 665 nm) wavelength was utilized for the two-band ratio algorithm in Equation 2. In addition, 620 nm was adopted to calculate the absorption coefficient of water constituents in Equations 14 and 15. Meanwhile, relatively high peaks were observed at 560 nm, 650 nm, and 700 nm, indicating low absorption of the pigments. The reflectance at 700 nm is related to extreme algal blooms as the red and NIR regions denote enhanced backscattering effect of algae (Gons 1999). Specifically, at 560 nm, the average reflectance values of the GE and YS rivers were higher  than those of ND. The R rs values of the GE River showed a wide range from 0.003 sr −1 to 0.029 sr −1 , whereas the YS samples had relatively high R rs values above 0.013 sr −1 . In contrast, the R rs values in the ND River were restricted within a narrow range of 0.002 sr −1 to 0.020 sr −1 . This may be due to different conditions of water constituents, such as different concentrations of cyanobacterial blooms and suspended sediments (Aurin et al. 2010;Chang et al. 2006). Therefore, ND River had the lowest changes in the R rs values, even though the differences between the maximum and minimum pigment concentrations were relatively higher in ND River than those in YS River. The water constituents showed variations in the field reflectance data for the different study areas.

Performance of bio-optical algorithms
In this study, bio-optical algorithms were used to estimate the pigment concentrations of water samples. As shown in Figure 6(a-b), [Chl-a] 3B had the highest performance in GE and ND rivers (0.84 and 0.82, respectively). Meanwhile, in the YS River, the R values of the band-ratio algorithms showed a low performance of below 0.39. Similarly, the semianalytical algorithm results were in good agreement with the observed pigments in the GE and ND rivers and revealed low R values (< 0.40) in the YS River Figure 6(c). According to the PC concentration results, [PC] 2B showed high performance with R values reaching 0.77 and 0.67 in the GE and ND rivers, respectively Figure 7(a-b). However, in YS River, the R value of [PC] 3B was 0.64, which is higher than that found by the other band ratio algorithms. The semi-analytical algorithm, [PC] Simis , exhibited similar performance with the semi-empirical algorithm, revealing R values of 0.84 and 0.67 for the GE and ND rivers Figure 7(c), respectively, and an R value of −0.18 for the YS River. Overall, most of the pigment concentrations estimated using bio-optical algorithms were in good agreement with the observed concentrations in the GE and ND rivers. Conversely, the results for the YS River had relatively low performance because the number of samples was insufficient, as compared with the GE and ND rivers (Oyedare and Jerry Park 2019). This is because water bodies have distinct optical properties based on the different water components (Mobley 1995). Previous studies have utilized various bio-optical algorithms that use optimized bands to enhance the accuracy of Chl-a and PC  concentrations in different study sites (Duan et al. 2010;Hunter et al. 2010;Mishra and Mishra 2012;Schalles and Yacobi 2000).
On the basis of R values, specific bands were selected in the present study to show the best algorithm performance for each study area (Table 3). The selected wavelengths of the optimized band ratio algorithms were different from those of the previous algorithms because of the distinct optical properties of different water components in each water body. In particular, the optimized two-band ratio algorithm for PC adopted the disparate wavelengths of 740 nm and 792 nm in the YS River. In contrast, the conventional two-band ratio algorithm utilizes 600 nm and 708 nm to estimate PC concentration (Mishra 2012). The optimized band ratio algorithms obtained Chl-a with an R value above 0.83, which was remarkably higher than that of the semi-empirical and semi-analytical algorithms Figure 6(d-e). Moreover, the optimized threeband ratio algorithm of Chl-a showed an R value above 0.85, and an RMSE value below 16.49 mg/m 3 . As shown in Figures 7(d-e), the performances of the optimized band ratio algorithms for PC were remarkably higher than those of the other algorithms. Specifically, the R values of [PC] optim-3B in the GE, ND, and YS rivers were over 0.90, 0.70, and 0.72, respectively. Additionally, the RMSE was below 40.90 mg/m 3 for the overall study area. The performances of the overall bio-optical algorithms for each study site are shown in Figures 6 and 7. To identify the training and validation performances of the optimized band ratio algorithms, the scatter plots and model performance were obtained Figure 8. The R values of [Chl-a] Optim-2B and [PC] Optim-2B were above 0.78 and 0.69, respectively. In case of the optimized three band ratio algorithms for estimating the validation datasets, the R values of Chl-a and PC were above 0.75 and 0.75, respectively. The overall performances of the optimized band ratio algorithms in the GE River were relatively higher than those in the other two rivers. In general, the optimized three-band ratio algorithms of Chl-a and PC exhibited considerably higher performance than the other algorithms. Thus, band selection optimization is necessary to obtain the best performance of the method used to estimate cyanobacteria in different water bodies. Furthermore, the complex optical properties of inland water led to the construction of various bio-optical algorithms that use optimized bands owing to the different biooptical characterizations of the water bodies (Morel et al. 2007;Moore et al. 2014;Stambler 2005). Even though the optimized band ratio algorithms showed remarkable performance, it is difficult to construct a generalized model for the three different study areas considered in the present study.

Estimating the spatial attention CNN model
Chl-a and PC concentrations were estimated using a spatial attention CNN model. The correlations for the entire study area are shown in Figure 9. The R and RMSE values of the Chl-a were above 0.87 and below 12.09 mg/m 3 , respectively. Meanwhile, for the PC, the R value of the validation was 0.88, and the RMSE was 25.01 mg/m 3 . The estimated pigment concentrations were in good agreement with the observed pigment concentrations for the three study areas. The spatial attention CNN model was generalized by training the overall data without separating the datasets. In addition, the overall performance of the spatial attention CNN model was remarkably higher than those of the bio-optical algorithms. The correlation and model performance of each study area are shown in Figures 6(f) and 7(f). The R values of Chl-a and PC were above 0.91 and the RMSE values were below 18.81 mg/m 3 in the GE and ND rivers. Compared with the bio-optical algorithms, the spatial attention CNN model showed the highest accuracy in the GE and ND rivers Figure 10. Thus, the samples in the GE and ND rivers were acceptable for training the various spatiotemporal variations of the pigments using the CNN model and showed good performance. Meanwhile, the [Chl-a] Spatial-CNN model for the YS River had a similar R value (0.85) as the [Chl-a] Optim-3B model but had high RMSE value of 10.15 mg/m 3 compared to the model (8.27 mg/m 3 ). In addition, the [PC] Spatial-CNN of the YS River had a significantly low correlation (0.16) with the observed data compared with the optimized band ratio algorithms. The high performance of the spatial attention  CNN model in the GE and ND rivers can be attributed to the high number of samples. According to Zhu et al. (2016), the performance of a data-driven model can be improved by increasing the training data, which is known as effectiveness of big data. In contrast, the number of samples in the YS River was only 24, and the different trends observed in the reflectance data might have caused low accuracy when the spatial attention CNN model was used for estimation of pigment concentration. In addition, the field reflectance data in the YS River revealed higher R rs values than those in the GE and ND rivers Figure 5(c). The irregular reflectance patterns in samples from YS River may have resulted in the poor estimation of the PC concentration as the deep learning model trains the representative features of most datasets (Das, Datta, and Chaudhuri 2018). The performance of spatial attention CNN model may be further improved by obtaining sufficient datasets to train the model for various water bodies.

Sensitivity analysis for identifying representative optical features
The spatial attention module indicated that the important R rs bands in the CNN model that used quantitative attention weights Figure 11. Overall, similar patterns of attention weight were observed but slight differences were observed in some samples. Specifically, the range of wavelength between 693 nm and 800 nm had considerably higher attention weights than the other bands in the study area. The range between 693 nm and 800 nm are located in the red and near-infrared (NIR) regions. According to Richardson (1996) and Gons (1999), the red and NIR reflectance were high when a critical algal bloom occurred due to the dominant backscattering effect of algae. In particular, 708 nm was utilized for the band ratio algorithms to estimate both Chl-a and PC concentrations. Han and Rundquist (1997) reported that the band ratio algorithm using 705 nm showed a remarkable correlation with the concentration of Chl-a. Gitelson (1992) employed a two-band ratio of 700 nm to consider the combined absorption of algal pigments. The wavelengths near 600 nm and 650 nm also indicated relatively higher attention weights, according to the band ratio algorithms that used these wavelengths to calculate the PC and Chl-a, respectively. Mishra, Mishra, and Schluchter (2009) reported that the band obtained near 600 nm showed the maximum absorption related to the PC concentration without the effect of Chl-a. Similarly, the absorption of Chl-a and PC increased in the band near 650 nm (Schalles and Yacobi 2000). Bio-optical algorithms, including semi-empirical and semianalytical algorithms, examine reflectance using these band regions (Mishra, Ogashawara, and Abraham Gitelson 2017). Thus, the spatial attention CNN model utilized the bands at 600 nm, 650 nm, and the NIR region as the important factors in the training process for estimating the concentrations of Chla and PC. Although the model performance in the YS River was relatively insufficient, the features of the visualized data had a similar pattern to that of the entire study area. However, the CNN model prioritized the features of the GE and ND rivers because the datasets in the GE and ND rivers were significantly higher than those of the YS River. Although the pigments in the YS River were relatively low in concentration, the spatial attention CNN model still prioritized the red and NIR regions, which are related to a high concentration of algal blooms. In addition, the red and NIR regions can be utilized to estimate the high concentrations of Chl-a and PC, such as in the Figure 11. Visualization of the attention weights in each study area. Significant wavelengths are indicated in red (relatively high weight value), and insignificant wavelengths are denoted by blue (relatively low weight value). samples of the GE and ND rivers. The relatively low model performance might be the result of the different features of pigment variations in the YS River.
The spatial attention CNN model swept the images as one pixel by one pixel and used it as input data. After the estimation of pigment concentrations, the output data were reconstructed to the same size as the initial hyperspectral images. The spatial distribution maps of algal pigments were visualized using the spatial attention CNN model and the drone-borne and airborne hyperspectral images in the GE River Figure 12. As shown in Figure 12(a), the hyperspectral image was obtained by drone-borne remote sensing, and the distribution map showed the salt and pepper noises. These noises might be caused by the low signal of the hyperspectral sensor and the application method of the distribution maps. The spatial attention CNN model estimated that the lakeside regions had remarkably high pigment concentrations, reaching above 300 mg/m 3 . Moreover, as these regions have relatively slow water velocity, it may aggravate cyanobacteria outbreaks (Susana et al. 2013). The model results showed relatively low concentrations of Chl-a and PC at the center of the reservoir. In contrast, as shown in Figure 12(b), abundant algal blooms line in the middle of the river. As the water discharge near the weir spillway increases the water velocity, algal blooms gather and flow into the spillway . These results suggest that the spatial attention CNN model can be used to estimate the spatial distributions of cyanobacteria blooms over a wide target area using hyperspectral imagery. The utilization of the drone-borne and airborne hyperspectral images could improve the generalization ability of the spatial attention CNN model. Furthermore, the spatial attention CNN model has a possibility to expand to extensive areas, including different water bodies, by using the satellite remote sensing imagery.

Conclusions
This study aimed to estimate the concentrations of harmful algal pigments in various water bodies using a spatial attention CNN model and to subsequently compare its performance with that of conventional bio-optical algorithms. Furthermore, the spatial attention module was also applied to calculate the attention weights and identify the importance of the reflectance bands in the model training process. As the results, the bio-optical algorithms showed different performances according to the utilized bands and target area. In the GE and ND rivers, the conventional bio-optical algorithms had R values above 0.82 and 0.67 for Chl-a and PC, respectively. The selection of optimized bands and the performance of the biooptical algorithms were influenced by the specific optical characteristics of individual water bodies. The spatial attention CNN model showed good performance with above 0.91 of R value in the GE and ND rivers. However, the results in the YS River showed a poor correlation as 0.16 of R value for PC because of the different reflectance input features and a lack of training data. The spatial attention module identified and visualized the importance of input wavelengths for training the CNN model. Specifically, the spatial attention CNN model utilized the NIR region, which represents the backscattering effect of water and the absorption of pigments. In addition, the bands at 600 nm and 650 nm were used to consider the absorption of Chl-a and PC.
This study demonstrated the capability of generalization of spatial attention CNN model to estimate cyanobacteria outbreak in various water bodies. In future research, the uncertainties of the deep learning model can be further improved by acquiring sufficient datasets. The model can be applied to all streams by utilizing Sentinel-2 or Landsat 8 satellite imagery to overcome the limitations of drone-borne and airborne remote sensing methods. By utilization of multispectral satellite data, the spatial attention CNN model could estimate the distribution maps of Chla and PC on entire streams in South Korea and could be expanded to other countries that had the similar water conditions. Program" funded by the Korea Ministry of Environment (MOE) [No. 2020003050001].

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data that support the findings of this study are available from the corresponding author, J.C. Pyo, upon reasonable request.