Spatiotemporal variation of cyanobacterial harmful algal blooms in China based on literature and media information

ABSTRACT Cyanobacterial harmful algal blooms (CyanoHABs) in inland waters are now among the most pressing environmental issues worldwide, especially in China. Satellite remote sensing has limitations in monitoring CyanoHABs in small water bodies due to spatial and temporal resolution limitations. While literature and news media have the potential to supplement satellite remote sensing in monitoring CyanoHABs, they have currently not received sufficient attention. In this study, we combined information on the distributions of CyanoHABs from literature and media for the first time to comprehensively assess the spatiotemporal variation in CyanoHABs in China. We collected, cleaned, validated, and organized data from literature and media on CyanoHABs in China, resulting in the establishment of a comprehensive database on CyanoHABs in China's inland waters (ChinaCyanoDB) covering 198 water bodies, 525 records for 1950–2021. The majority of water bodies with CyanoHABs (CyanoWaters) are located in the eastern China, mainly concentrated in the middle and lower Yangtze region, with a clear upward trend in their number over the last four decades. The ChinaCyanoDB and analytical results can provide valuable data support for monitoring and managing CyanoHABs in China while the database construction method may also be applied to other countries and regions.


Introduction
The definition of cyanobacterial harmful algal blooms (CyanoHABs) has been a subject of considerable debate, which typically revolves around three key aspects: biomass, appearance, and harm (Brooks et al. 2016;Beaulieu, Pick, and Gregory-Eaves 2013).In this study, CyanoHABs refer to cyanobacteria that proliferate, gather, and float on the water surface under suitable meteorological conditions and adequate nitrogen and phosphorus.We chose this definition for two reasons.The main reason is that this definition can ensure the uniformity of records in the database.Presently, two main methods are used to monitor CyanoHABs: on-site visual interpretation and satellite remote sensing interpretation.Satellite remote sensing and on-site visual monitoring can only detect CyanoHABs that form visible scums on the water surface.Some studies have used the chlorophyll-a concentration or algal density threshold methods to define cyanobacterial blooms (Brooks et al. 2016).However, in this study, to ensure the uniformity of the definition of CyanoHABs in the database, we standardized the definition, focusing on the aggregation of algae and the visible floating layer on the water surface, while disregarding on-site measurements of the chlorophyll-a concentration and algal density.The second reason for choosing this definition is that CyanoHABs that aggregate and form a floating layer on the water surface tend to produce more toxins, posing a greater threat to the ecological landscape.Thus, such CyanoHABs require heightened attention.Studies have reported that the most harmful algal blooms in inland water bodies are CyanoHABs (Wu et al. 2007).CyanoHABs endanger the ecological environment of water bodies, which can be harmful to aquatic organisms and humans (Wang, Chen, and Wang 2021).Owing to anthropogenic activities and continuous global warming, CyanoHABs have become one of the most serious environmental problems in water bodies worldwide (Brooks et al. 2016;Paerl and Barnard 2020).Therefore, tracking the spatiotemporal variation in CyanoHABs occurrence is important for the mitigation of CyanoHABs and the protection of water environment (Jiang and Huang 2004;Li et al. 2022;Paerl et al. 2016;Scott and McCarthy 2010).
Cyanobacterial cells have gas vacuoles, which enable them to float to the water surface and gather to form a green, film-like oil cyanobacterial bloom under suitable conditions, including the air temperature, wind speed, and sunlight (Walsby et al. 1997).However, other types of algal blooms are usually suspended in the upper layer of the water column and do not float on the water surface (Walsby et al. 1997;Yan et al. 2022;Coffey et al. 2019;Wells et al. 2015).Moreover, dinoflagellate and diatom blooms usually make the water appear reddish brown, rather than green (Brooks et al. 2016).Therefore, CyanoHABs differ significantly in color and texture from ordinary water bodies and other harmful algal blooms (Song et al. 2022), such that they can be easily determined by visual inspection in the field.Additionally, the near-infrared band reflectance of CyanoHABs is generally significantly higher than that of ordinary water bodies (Hu 2009); spectral indices can be constructed based on this feature for the satellite remote sensing monitoring of CyanoHABs, including NDVI (Normalized Digital Vegetation Index) and FAI (Floating Algae Index) (Chen and Liu 2014;Fang et al. 2018;Hu 2009;Ma et al. 2021;Qi et al. 2018;Ananias and Negri 2021).
With the development of satellite remote sensing technology, remote sensing methods have gradually become an important means of monitoring CyanoHABs, which have the advantages of extensive coverage, dynamic monitoring, and low cost (Yan et al. 2022;Yang et al. 2022).Many studies have focused on the monitoring and spatiotemporal distribution analysis of CyanoHABs based on satellite remote sensing, gradually expanding the scope of water bodies monitored from individual water bodies to regional, national, and even global scales (Fang et al. 2022;Hou et al. 2022;Song et al. 2021;Zhang et al. 2021;Zhao et al. 2018;Zhu et al. 2018).Researchers have also developed CyanoHABs databases utilizing remote sensing.For example, the CyanoHABs database in 48 lakes and reservoirs larger than 1 km 2 from 1983 to 2017 based on Landsat satellite data (Song et al. 2021) and the database in global inland water bodies larger than 0.1km 2 based on Landsat data from 1982 to 2019 (Hou et al. 2022).However, they only used Landsat data and did not include other satellite remote sensing data, which could underestimate the situation of CyanoHABs in China.In addition, the database publicly released by Hou et al. (2022) was organized with a node every 10 years and did not include specific dates for each occurrence of CyanoHABs.Owing to spatial resolution limitations, monitoring small water bodies is often difficult.Moreover, the revisit period and cloud coverage cause limitations in the spatiotemporal coverage ability of remote sensing.Therefore, satellite remote sensing cannot cover all CyanoHABs in inland waters (Fang et al. 2022;Song et al. 2021).
On the other hand, CyanoHABs can be clearly identified on-site, such that visual interpretation is another important means of bloom monitoring.With the increasing public awareness of the dangers of CyanoHABs, blooms are receiving increasing public attention; many CyanoHAB records have been reported in the news and on social media.As CyanoHABs can easily be identified visually, the CyanoHABs reported by the news based on visual interpretation with on-site photos are also credible.Moreover, we can verify the reliability of the news reports based on the accompanying photos.News and social media can include water bodies that are very small and cannot be monitored by satellite remote sensing; field observations are also unaffected by clouds and have more flexible timing.In addition to media information, extensive literature has reported CyanoHABs based on visual interpretation (Huang 2011;Tao 2021;Wan, Xu, and Zhang 2017).
A more comprehensive database is required to obtain more comprehensive records of Cyano-HABs in China's inland waters and analyze their spatiotemporal variation.Therefore, in this study, we aimed to provide a comprehensive database and a decision support for monitoring and managing CyanoHABs in China.To the best of our knowledge, this study is the first to use the literature and media information to obtain broad spatial and frequent monitoring data on CyanoHABs and subsequently establish a CyanoHABs database for China's inland waters (China-CyanoDB).Using the database, the spatiotemporal variation patterns of CyanoHABs in China were analyzed while demonstrating the importance of on-site visual monitoring of CyanoHABs for the spatiotemporal distribution of CyanoHABs in China.

Study area
The study area included all inland water bodies in China.China is in a climatic zone ranging from the northern temperate zone to the tropics.The topography has a step-wise distribution that is high-elevation in the west and low-elevation in the east, with the main topography including mountains, hills, plains, and low-lying areas.Owing to its geographical location, climatic conditions, geomorphological features, and various anthropogenic influences, the distribution of inland water bodies in China has certain characteristics that have led to the establishment of five lake regions (Ma, Yang, and Duan 2011): the Qinghai-Tibetan Plateau Lake Region (QTR), Eastern Plain Lake Region (EPR), Mengxin Plateau Lake Region (MXR), Yungui Plateau Lake Region (YGR), and Northeast Lake Region (NER).According to the results of satellite remote sensing monitoring from 2004-2008, there are 2,921 lakes and 2,596 reservoirs (area > 1 km 2 ), with a total surface area of 91,019.6 km 2 (Ma, Yang, and Duan 2011).

Processing of CyanoHABs records
The data processing was divided into two parts (Figure 1): data collection, cleaning and validation and the establishment of the CyanoHABs database.First, the CyanoHABs records were obtained from media and literature information through keywords search.The data cleaning removed records irrelevant to CyanoHABs in China by manual reading, and then verified the obtained records through three validation methods (on-site photo validation, remote sensing image validation, and cross-validation) to delete error records.Second, the result records were used to build the CyanoHABs database, primarily including the records and occurrences of CyanoHABs.

Collection of CyanoHABs records
In this study, CyanoHABs records were collected from literature and media information from 1950 to 2021.For literature information, the Chinese literature database 'CNKI' (https://www.cnki.net)was used, which covers most Chinese journals, and the English literature databases of 'Web of Science' (https://www.webofscience.com) and 'Engineering Index (EI)' (https://www.cnki.net)were used, which cover most English journals.Among the media information platforms, the Chinese media platform 'Baidu Information,' which contains most news information, and 'Weibo,' containing most social media information, were used.As there are few English media reports on CyanoHABs in China's inland waters, English news and social media information were not considered in this study.Different search formulas were set according to the English and Chinese language to comprehensively collect relevant data.For the title keywords search in the English literature, 'cyanobacteria,' and 'algal bloom,' were firstly used, followed by three more frequent abbreviations 'HABs (harmful algal blooms),' 'CyanoHABs (cyanobacteria harmful algal blooms),' and 'cyanobloom (cyanobacteria bloom).'For the Chinese literature and media information platforms, the same title keywords and media news keywords were used, as well as 'water bloom.' The number of reports collected using the above search method is listed in Table 1.In the media information platforms, 770 related records were collected, including 345 on Baidu and 425 on  Weibo.In the literature information, 22,781 related records were collected, including 4,923 from CNKI, 14,766 from Web of Science, and 3,092 from EI.

Cleaning of CyanoHABs records
The search terms used for CyanoHABs records in this study were relatively broad, to collect as many reports as possible.For example, 'cyanobacteria' may include a study of algal species and the living environment, not necessarily a cyanobacteria bloom, or 'algal bloom,' but may be a dinoflagellate bloom, eventually leading to the collection of reports that were not all on Cyano-HABs.Therefore, the records had to be further cleaned to filter out reports unrelated to the occurrences of CyanoHABs.The collected reports were cleaned based on the text to eliminate reports unrelated to Cyano-HABs in China's inland waters.Manual reading was performed for media information, retaining media information with specific descriptions that identified occurrences of CyanoHABs in China.The cleaning methods used for Chinese and English literature slightly differed.The search using CNKI was fixed within the scope of Chinese literature without filtering the research area.However, English literature required a filter on the research area, retaining research for the Chinese region or global region in the literature.The literature was then manually read, retaining records with information specifically describing CyanoHABs occurrences.
After data cleaning, 547 records from 1950 to 2021 were retained, including 158 media and 389 literature records.To explore the origin of the records, we calculated the proportion of retained records after cleaning (the ratio of the number of results after cleaning to the number of the originally collected results) for each data platform.The proportion of data used from the media information platform was as follows: Baidu Information = 43.48% and Weibo social media = 1.88%.Baidu Information is the platform for official news sources, which usually includes audited, more professional, and reliable information.However, Weibo social media comes from public users without professional experience, with no professional review of the information.Therefore, the proportion of retained data from Weibo was low, indicating that the general public's awareness of CyanoHABs is insufficient.In the literature database, Web of Science and EI indicated low retained data proportions of 0.87% and 0.94%, respectively; however, CNKI had a higher retained data ratio of 4.69%.This is because Chinese literature focused more CyanoHABs on China.

Validation of CyanoHABs records
The monitoring methods for CyanoHABs in China's inland waters are satellite remote sensing and visual interpretation, both of which have some incorrect judgment.First, for remote sensing monitoring, as CyanoHABs and aquatic vegetation have similar spectral characteristics on remote sensing images, some remote sensing monitoring studies misclassify aquatic vegetation as CyanoHABs.Second, for visual interpretation monitoring, most of these records will contain some on-site photos or professional presence with a high credibility.However, there are still some records where there are no on-site photos and professionals present.Thus, there may be problems with misjudging CyanoHABs.
To further ensure that the collected CyanoHABs records were not misjudged as described above, the cleaned records were then subjected to data validation.The validation included three main methods: on-site photo validation, remote sensing image validation, and cross-validation.Onsite photo validation refers to a visual interpretation based on on-site photos.Remote sensing image validation was performed using multi-source satellite remote sensing image data on the Google Earth Engine platform, such as Sentinel-2 MSI, Landsat TM/OLI, and MODIS.Some studies have shown that there are some spectral indices, such as PSI, PBL and MSI (Li et al. 2009;Zhu et al. 2016), which can be used to distinguish between aquatic vegetation and CyanoHABs.It is mainly because the CyanoHABs has a reflection valley around 620 nm due to the CyanoHABs absorption peak, whereas aquatic vegetation does not.However, the multi-spectral remote sensing data used in this study, including Sentinel-2 MSI, Landsat TM/OLI, and MODIS, do not have the 620 nm band, so the aquatic vegetation phenological characteristics were used in this study to distinguish between aquatic vegetation and CyanoHABs instead (Song et al. 2021).As CyanoHABs float on the water surface, they have a high floatability, allowing them to quickly spread and cover new water areas within a short period, whereas the coverage range of aquatic vegetation generally remains stable over the same timeframe.Therefore, if the remote sensing images show no change in the location of the red feature for several consecutive days (5-7 days), the feature is likely aquatic vegetation rather than a CyanoHAB (Song et al. 2021;Zhao et al. 2018).Cross-validation refers to two or more records of CyanoHABs in the same water body at similar times and is therefore considered reliable.In the process of data validation, there may also be a situation where the data needed for these validation methods cannot be obtained.In this case, we further checked whether the textual description of the CyanoHABs in the record was detailed and reliable.Then, we analyzed whether the CyanoHABs have previously existed in the water body to obtain a comprehensive judgment on whether the record is reliable or not and finally deleted the unreliable records.We note that when conducting remote sensing image validation for news reports, the standard procedure involves selecting images from the same day for verification.If data for the same day are unavailable, the verification period was extended to the preceding and following 10 days, observing the water body's status over the past 20 days to detect any CyanoHABs.If CyanoHABs are observed in remote sensing images taken within a 10-day interval, it can be considered a reliable indication of CyanoHAB occurrence.However, if no CyanoHABs are observed in the remote sensing images, it does not necessarily indicate that the record is incorrect.Then, cross-validation or on-site photo validation verification methods are used for further examination.If none of the above methods can validate a record, we refer to such case as an unvalidated record.Additionally, after conducting a thorough statistical analysis, we observed only 17 unvalidated records in our database.To uphold the credibility of the database, we removed unvalidated records.This ensures the integrity and reliability of the information used in our research.The detailed validation process of the records is illustrated with examples in Appendix A.
After data validation and deletion of unreliable records, a total of 525 validated records were retained, which included 158 media information records and 367 literature records.Calculating the proportion of retained records after validation of the data platforms (ratio of the number of validated results to the number of results after cleaning) indicated that all of the error records were from the literature.There are two main error types.The first error type is based on the remote sensing monitoring of CyanoHABs, in which aquatic plants were misjudged as CyanoHABs, such as in Dongting Lake (Xue et al. 2015), Caohai Lake (Zong et al. 2019), and Poyang Lake (Qian et al. 2016).The second error type is based on visual interpretation monitoring methods.Some news recorded the occurrence of CyanoHABs, but no CyanoHABs were found on the remote sensing images by image verification.Examples include the Xionghe Reservoir (Zhang, Li, and Song 2009) and Ge Lake (Guan et al. 2020).

Establishment of ChinaCyanoDB
After data processing, 525 CyanoHABs records were obtained, which included 198 inland water bodies in China.These data were used to establish the ChinaCyanoDB.As a single record may contain CyanoHABs from multiple water bodies or CyanoHABs from one water body over many years, we recorded the information and the CyanoHABs information separately to avoid duplication.Therefore, this database includes two tables in total as follows: the metadata table and the Cyano-HABs information table.The metadata table has the primary key of 'Record ID,' which is the foreign key of the CyanoHABs information table.The metadata table contains basic information for the obtained literature or media reports, including the title and date of the report, name of the literature or media, data source, type of report, and the CyanoHABs judgment method.The CyanoHABs information table contains information on the occurrences of CyanoHABs in China's inland waters, including the name of the water body, the date of CyanoHABs occurrence, and the area of the CyanoHABs.Information on CyanoHABs is diverse and complicated.For instance, some reports only recorded one CyanoHAB event in a water body, usually with the date of the occurrence and the area of the CyanoHABs, whereas others recorded multiple CyanoHABs in a water body (mostly in articles based on satellite remote sensing).To facilitate the standardization of these complex data into the database and facilitate subsequent analysis and application, this database recorded the standardization procedure.First, the year and area of the CyanoHABs were recorded.If multiple years of CyanoHABs were recorded in a report, each year corresponds to one record in the table.For multiple CyanoHABs occurrences in one year, the database only records the date with the largest bloom area.Based on the above analysis, the structure of the two tables in the database is provided in Tables 2 and 3; the contents of the two tables are shared in zenodo (https://doi.org/10.5281/zenodo.8245690).We note that the data sources for the ChinaCyanoDB are the news media and literature; changes in public attention to CyanoHABs did affect the amount of data in the database.However, we attempted to collect, clean, and validate the data to ensure the comprehensiveness and reliability of the ChinaCyanoDB.

Data analyses
The ChinaCyanoDB was used to analyze the results of CyanoHABs using different methods and investigate the spatiotemporal variation in water bodies with CyanoHABs (CyanoWaters) in China.The frequency of CyanoHABs was defined as the sum of the years in which CyanoHABs occurred in a water body.These analyses were conducted using ArcGIS 10.6 and Origin 2021.

Results of CyanoHABs by different methods
The CyanoHAB records (525) were stored in ChinaCyanoDB, which included 158 media records and 367 literature records.Of the 158 media records, 156 were determined using on-site visual interpretation; the other two records reported the results of remote sensing monitoring of Cyano-HABs.Of the 367 literature records, 241 were determined by on-site visual interpretation; the other 126 records were determined by remote-sensing image analysis.In some cases, for a specific water body, two methods were used to monitor CyanoHABs.Therefore, the methods for monitoring Cya-noHABs can be classified into three categories as follows: remote sensing only, visual interpretation only, and both remote sensing and visual interpretation.A statistical analysis of all of the 198 water bodies in the ChinaCyanoDB (Figure 2, Table 4) indicated that 27 were monitored by remote sensing only, 147 were monitored by visual interpretation only, and the remaining 24 were monitored by both methods.The results indicate that the number of CyanoHABs that can be monitored by remote sensing is relatively small; more CyanoHABs must be detected by on-site visual interpretation.Water bodies with CyanoHABs monitored using on-site visual interpretation were primarily small urban water bodies with a low frequency of CyanoHABs occurrences such as urban landscape water (examples include Nianjia Lake and Wuhan Jiqi Dangzi Lake) and urban rivers (such as the Laodao River and Yudai River) (Ma et al. 2015;Tang et al. 2018;Wan, Xu, and Zhang 2017).The small size of these water bodies and the low frequency of CyanoHABs occurrence make them difficult to be monitored by remote sensing images.Moreover, these water bodies are usually in urban areas, where occurrences of CyanoHABs can easily be detected and reported in the media or literature.Most water bodies with CyanoHABs monitored by remote sensing images are usually medium-sized and located in suburban areas such as the Bayi Reservoir and Chitian Reservoir (Fang et al. 2018).The water bodies with CyanoHABs monitored by both methods were predominantly large lakes and reservoirs with a high frequency of occurrences such as Taihu Lake, Chaohu Lake, Dianchi Lake, Hulun Lake, and Erhai Lake (Zhang et al. 2015;Zhu et al. 2018).Among the five lake regions in China, the monitoring methods in the YGR and EPR were mainly in on-site visual interpretation because most of the CyanoWaters are small water bodies, which cannot be monitored by remote sensing owing to spatial resolution limitations.In contrast, in the MXR and NER, remote sensing monitoring methods were mainly used, likely owing to the low population density in these two lake regions.CyanoHABs were concentrated in medium and large water bodies.
The earliest record of CyanoHABs was published in 1992 (Shen 1992), which recorded Cyano-HABs in Taihu Lake in the summer of 1960 during an on-site investigation by scientific researchers.The highest number of CyanoHABs reports in the last three decades occurred in 2011, with 48 related reports (12 media and 36 literature reports).The number of media reports can reflect the public's attention to CyanoHAB events.In terms of the number of media reports, public attention to CyanoHABs was low prior to 2006, with only a few media reports; public attention began to rise considerably after 2007.This was attributed to the drinking water crisis in Wuxi City caused by the severe CyanoHAB events in Taihu Lake in the spring of 2007 (Qin et al. 2007).Media attention fluctuated from 2008 to 2018.However, it was relatively stable overall.There was a significant downward trend in media attention for CyanoHABs from 2018 to 2021.The number of literature reports can reflect researcher attention to CyanoHABs events; there were limited reports in the literature prior to 2004.However, after 2004, researcher attention to CyanoHABs increased substantially, representing a general upward trend from 2004 to 2019, reaching a peak in 2019, followed by a slight downward trend in attention after 2019.

Spatial variation in CyanoHABs
Figure 3 shows the spatial distribution of the frequency of CyanoHABs in China.Figure 3 shows a high number of CyanoWaters in eastern China, primarily concentrated in the central-eastern region.The blue-dotted line connecting Heihe City in northeast China and Tengchong City in southwest China (Figure 3) is referred to as the Hu Line (Zhang et al. 2021).A total of 190 Cyano-Waters were located on the eastern side of the Hu Line, and only 8 CyanoWaters were on the western side.Conversely, west of the Hu line is sparsely populated.To the east of the Hu Line, the terrain is flat, with low elevation and a dense population.These geographical features provide more opportunities for CyanoHABs.Conversely, to the west of the Hu Line, the terrain is more rugged, higher elevations, and lower population density, which reduces the occurrence of Cyano-HABs.This is also consistent with the results of previous studies (Guan et al. 2020;Song et al. 2021).
Of the 198 CyanoWaters found in China, 162 of them are concentrated in the EPR, which accounts for 81.8% of all of the CyanoWaters.The YGR, MXR, and NER had 15, 14, and 7 Cyano-Waters, respectively, while no CyanoWaters were detected in the QTR.
The overall frequency of CyanoHABs was low.The most common frequencies ranged between one and nine years.The longest CyanoHABs frequency was found in Taihu Lake, which was 45 years.CyanoHABs were first reported in Taihu Lake in 1950, with occurrences in every year since 1980.Other lakes with a CyanoHAB frequency of more than 30 years include Chaohu Lake, Hulun Lake, Xingyun Lake, Dianchi Lake, and Erhai Lake.The highest average CyanoHABs frequency was 9.21 per year in the NER, and the lowest was 3.30 per year in the EPR.We note that there were no CyanoHABs found in the QTR.The number of CyanoWaters in the Northeast Lake region was low (seven) while CyanoWaters had a high CyanoHABs frequency (such as Lianhua Reservoir and Erlongshan Reservoir), which increased the overall average value.In the EPR, although there were CyanoWaters with a high CyanoHABs frequency, most CyanoWaters with a low CyanoHABs frequency decreased the overall average value.The results of the provincial spatial distribution of CyanoHABs in China is represented in Figure 4. From the number of CyanoWaters in each provincial administrative region (Figure 4(a)), 27 regions were found to have CyanoWaters; no CyanoHABs were reported for 8 regions, including Hong Kong, Macau, Taiwan, Liaoning, Gansu, Qinghai, Hainan, and Tibet.Hubei Province had the highest number of CyanoWaters (31).This result may be due to the high number of water bodies in Hubei Province, which has 188 water bodies larger than 1 km 2 (fourth highest in China).Sichuan, Ningxia, and Shaanxi provinces had the lowest number of CyanoWaters, with only one in each province.
For the CyanoHAB frequency in each provincial administrative region in China, the highest average CyanoHAB frequency was found to be 15.66 years in Inner Mongolia.There were only three CyanoWaters in Inner Mongolia; however, the frequency of CyanoHABs was high in all three water bodies: Hulun Lake, 31 years; Nierji Reservoir, 8 years; and Daihai Lake, 8 years.The lowest frequency at the provincial level occurred in Sichuan Province (1 year).
The proportion of CyanoWaters in each province to the number of water bodies (area > 1 km 2 ) was calculated (Figure 4(b)).Shanghai and Beijing represented the highest proportion (20.0%,), which may be attributed to the low number of water bodies in Shanghai and Beijing, both of which have only four water bodies (area > 1 km 2 ).Meanwhile, Sichuan, Inner Mongolia, Hunan, and Jilin had the lowest proportion (< 1.0%).

Temporal variation in CyanoHABs
A 5-year interval was used to analyze the temporal trend in CyanoWaters.The earliest recorded CyanoWaters were Taihu Lake in 1950 (Zhang, Yang, and Shi 2019) and Chao Lake in 1959 (Shi et al. 2009).Based on this, Figure 5 represents the spatial distribution of the first occurrence of CyanoWaters in China, and the total number of CyanoWaters at 5-year intervals.Our results revealed that the number of first occurrences of CyanoWaters changed slightly from 1980 to 1995, with a rapid upward trend from 1995 to 2010.Additionally, the total number of CyanoWaters also indicated a significant upward trend.This observation is consistent with existing findings (Hou et al. 2022;Song et al. 2021).Between 2005 and 2010, the number of the initial occurrences of CyanoWaters peaked at 64 and the total number of CyanoWaters peaked at 125.This may be partially related to the drinking water crisis in Wuxi City, which was caused by the severe CyanoHABs in Taihu Lake in the spring of 2007 (Qin et al. 2007).This further led to a significant increase in CyanoHAB media attention and may have been the first time that many members of the public were aware of CyanoHABs.After 2010, the number of initial occurrences of CyanoWaters and the total number of CyanoWaters had a slightly decreasing trend, which may be related to increased efforts to control water pollution in China in the last decade (Chen 2018).
Most of the CyanoWaters in the EPR and MXR had initial occurrences between 2005 and 2010.The first occurrence of CyanoWaters in the NER and YGR occurred earlier, primarily between 1990 and 1995.Moreover, only the EPR indicated a significant change in the number of CyanoWaters in recent decades, which is consistent with the trend within the entire inland region of China.Both Taihu Lake and Chaohu Lake had CyanoHABs before 1980 in the EPR.The number of CyanoWaters in the other three lake regions fluctuated slightly, but indicated an overall upward trend.As the Taihu Lake incident in May 2007 caused an increase in media and public attention to CyanoHABs in China, CyanoWaters reached a peak from 2005-2010.However, in the past five years, the number of CyanoWaters in China has had a slightly downward trend, owing to the increased efforts of the state to control the water environment (Chen 2018).

Significance of on-site visual interpretation data
In previous studies, researchers analyzed the spatiotemporal variation in CyanoHABs in China using remote sensing data.In this study, the ChinaCyanoDB was created by combining both remote sensing and on-site visual interpretation data.To evaluate the impact of on-site visual interpretation data on the distribution of CyanoHABs in China, we divided the ChinaCyanoDB into two datasets based on the monitoring methods: remote sensing only and on-site visual interpretation only.We then compared and analyzed these two datasets, as well as the entire ChinaCyanoDB.
The three datasets, including the remote sensing only dataset (Figure 6(a)), on-site visual interpretation only dataset (Figure 6(b)), and ChinaCyanoDB, all exhibited similarities in the spatial distribution patterns of the CyanoWaters: there were a higher number of CyanoWaters to the east of the Hu Line than to the west.However, the on-site visual interpretation only dataset monitored a larger total number of CyanoWaters (189) while the remote sensing dataset only contained 51.This result highlights the limitations of remote sensing in CyanoHABs monitoring and underscores the importance of on-site visual interpretation in accurately identifying and monitoring CyanoHABs.
The CyanoHAB frequency differed between the on-site visual interpretation only dataset and the remote sensing only dataset.The on-site visual interpretation dataset showed smaller frequencies, mainly concentrated from 1 to 5 years while the remote sensing dataset exhibited higher frequencies, concentrated from 5 to 15 years.Remote sensing is more suitable for monitoring water bodies with large areas and frequent CyanoHABs while on-site visual monitoring is feasible for water bodies with smaller areas and occasional CyanoHABs.The majority of CyanoWaters in the on-site visual interpretation only dataset were concentrated in the middle and lower reaches of the Yangtze River and coastal areas with high population densities.This suggests that a certain population base is required to ensure public concern about CyanoHABs and the feasibility of the on-site visual discrimination method.Regarding the analysis of the temporal variation, the on-site visual interpretation only dataset showed consistent trend variation results with the entire ChinaCyanoDB database (Figure 7).The analysis revealed that the number of CyanoWaters has continuously increased from 1980 to 2020, reaching a peak of 120 CyanoWaters from 2005 to 2010, followed by a slight decline after 2010.The remote sensing only dataset showed a steady increase in the number of CyanoWaters from 1980 to 2015, with a peak of 31 CyanoWaters from 2010 to 2015.These findings also underscored the crucial role of the on-site visual interpretation dataset in comprehending the spatiotemporal distribution of CyanoHABs in China.Most of the CyanoWaters in China are small, particularly concentrated in areas with frequent human activities such as artificial lakes and landscape water bodies.The small size of these water bodies makes it hard to monitor them by remote sensing, but on-site visual interpretation provides the public with a feasible means of identifying CyanoHABs.
The ChinaCyanoDB established in this study may be incomplete due to the lack of public attention to CyanoHABs and sparse population in remote areas.However, despite these limitations, the ChinaCyanoDB currently represents the most comprehensive dataset of CyanoHABs in China's inland water bodies.The dataset serves as a valuable resource for understanding the distribution and temporal variation of CyanoHABs in China, and provides a foundation for further research and monitoring efforts in the future.Continuous efforts to improve data collection, including public awareness campaigns and expanding monitoring efforts in remote areas, can further enhance the accuracy and completeness of the ChinaCyanoDB.

Advantages of ChinaCyanoDB
Owing to the spectral characteristics of CyanoHABs, which gather on the water surface, they can be determined by remote sensing or visual inspection.This study combined literature and media information to obtain a wide range of CyanoHABs records in China, which were determined using remote sensing and visual inspection, yielding the most comprehensive database of CyanoHABs in China currently available.A total of 525 CyanoHABs records for 198 water bodies were obtained.Compared with the CyanoHABs records obtained from remote sensing monitoring only, the ChinaCyanoDB has the advantages of a larger number of water bodies and improved spatiotemporal variation analysis.
Among the 198 inland water bodies in the ChinaCyanoDB, only 51 water bodies were monitored by remote sensing.Additionally, the remote sensing results for monitoring CyanoHABs in China are based on Landsat satellite data for inland lakes and reservoirs (area > 1 km 2 ) (Song et al. 2021).However, there are a large number of small water bodies (area < 1 km 2 ) in the ChinaCyanoDB, including artificial lakes, landscape water bodies, and urban-type rivers.The smallest water body in the ChinaCyanoDB is Wulongtan Park Lake in Nanjing, which has a surface area of 0.008 km 2 .The results illustrate the importance of using the visual interpretation method from the literature and media for CyanoHAB monitoring.
In summary, the ChinaCyanoDB is the first well-validated comprehensive database of Cyano-HABs in the inland waters of China, combining journal and media information.The ChinaCya-noDB can monitor CyanoHABs in small water bodies, which have not been monitored by remote sensing data due to the insufficient spatiotemporal resolution of remote sensing data sources and cloud cover.To ensure the credibility and accuracy of the database, we employed three validation methods: on-site photo validation, remote sensing image validation, and cross-validation.Additionally, the results of this study improve the analysis of spatiotemporal variability of Cyano-HABs in China from 1950 to 2021.Furthermore, the methodology employed in constructing this database holds the potential for application in other countries and regions.

Limitations and future development of ChinaCyanoDB
The database established by this study also has certain limitations.We cannot guarantee that the data we collected are fully comprehensive.For instance, it is challenging to obtain internal environmental monitoring reports from various provinces and directly administered municipalities.Additionally, the availability of UAV monitoring results is also limited.However, we made every effort to collect, organize, and clean the CyanoHABs records to ensure the comprehensiveness of the database.
According to the results of the available remote sensing data sources used in ChinaCyanoDB, the most frequently used sources included MODIS, Landsat TM/OLI, Sentinel-2 MSI, and GF-1 WFV, which were used 69, 24, 15, and 11 times, respectively.Other satellite remote sensors are used less frequently for CyanoHABs owing to their low spatial resolution and long revisit time.Therefore, the ChinaCyanoDB could not fully exploit the role of multi-source satellite remote sensing data in water environment monitoring; there is space for further development in the spatiotemporal coverage of CyanoHABs.
Recently, with the development of remote sensing technology, the number of freely available remote sensing satellites with medium-to-high spatial resolutions has increased.These include Sentinel-2 MSI, Landsat9 OLI, and GF1/6 WFV.The satellite network can significantly improve the temporal resolution, more comprehensively capturing CyanoHAB occurrences.Additionally, an increasing amount of high spatial resolution satellite data is available (such as Worldview3, PlanetScope, Pleiades Neo, Jilin-1, and GF-2 PMS), with a resolution between 3 m and the sub-meter level, which can monitor smaller water bodies.Therefore, it is necessary to use multi-source satellite remote sensing data in the future to improve the spatiotemporal coverage of remote sensing monitoring results for CyanoHABs in China.At the same time, with the gradual improvement of the spectral index and algorithm for CyanoHABs monitoring, we can also realize high-precision automatic monitoring of CyanoHABs based on satellite remote sensing, which will significantly aid in the long-term and large-scale monitoring of CyanoHABs.
The valid records of visual interpretation in the ChinaCyanoDB were predominantly sourced from the news media, including media information platforms and social media.Very few valid records from social media on CyanoHABs were retrieved, indicating that the general public was relatively unaware of, not concerned, or not familiar with CyanoHABs.Therefore, ChinaCyanoDB did not fully exploit the role of the public in water environment monitoring; there is therefore room for improvement in the spatiotemporal coverage of CyanoHABs.The accurate identification of CyanoHABs can be achieved in the future by vigorously promoting smartphone applications for monitoring CyanoHABs, such as Water Color Watch (Li et al. 2022), through standardized photography and integrated recognition algorithms.
Moreover, this study invested a significant amount of time in data collection, cleaning, validation, and organization.In the future, with the rapid development of artificial intelligence, such tasks can be accomplished by artificial intelligence, reducing the required human effort.
In summary, with the use of more abundant satellite remote sensing data, as well as public science, smartphones, and artificial intelligence, the CyanoHAB dataset can be further enriched, which will aid in CyanoHABs monitoring and management.

Conclusions
This study aimed to address the lack of comprehensive databases and the incomplete understanding of the spatiotemporal variation in CyanoHABs in China.To achieve this, a keyword-based search was conducted to collect records on CyanoHABs in China's inland waters from various sources, including literature and media reports.After manual reading data, cleaning, and data validation based on three methods (on-site photo validation, remote sensing image validation, and cross-validation), the resulting records were obtained.The records were used to construct the extensive database of CyanoHABs in China (ChinaCyanoDB) from 1950 to 2021, comprising 525 reported records (with 158 media and 367 literature records) on 198 CyanoWaters.The database was then used to analyze the spatiotemporal variation in CyanoHABs in China, providing valuable insights into the prevalence and distribution of CyanoHABs.
This study revealed that there are three methods for monitoring CyanoHABs, i.e. remote sensing only, visual interpretation only, and a combination of both.Visual interpretation was found to be the primary method used, accounting for 74.24% of the CyanoHABs database, followed by remote sensing only at 13.64%, and a combination of both (12.12%).The visual interpretation method was primarily used to monitor small urban water bodies with a low CyanoHABs frequency while medium-sized suburban water bodies were predominantly monitored by remote sensing.The combination of both methods was mostly used to monitor large lakes and reservoirs with a high CyanoHAB frequency.
Our findings reveal a distinct geographic pattern in the distribution of CyanoWaters in China, with a high concentration of these waters in the eastern part of the country and a lower concentration in the west.Specifically, researchers have found that there were 190 CyanoWaters east of the Hu-Line, compared to only eight to the west.Most of these CyanoWaters were located in the eastern plain lake region, accounting for 162 of the total CyanoWaters identified.Furthermore, over the past 70 years, there has been a clear upward trend in the number of cyanobacterial bloom occurrences in China's inland water bodies.
Despite limitations in data collection, including restrictions in the spatial and temporal resolution, cloud cover in the remote sensing data, and public awareness of CyanoHABs, our database is currently the only comprehensive database on CyanoHABs in China's inland waters.The findings of this analysis and the spatiotemporal trends identified will prove valuable in managing and monitoring CyanoHABs in China.Additionally, the approach used to construct this database may be adapted to other countries and regions.

Figure 1 .
Figure 1.The processes of CyanoHABs database establishment: the collection, cleaning and validation of CyanoHABs records in China by using keywords searches in literature and media information, and the establishment of CyanoHABs database.

Figure 2 .
Figure 2. The spatial distribution of CyanoHABs recorded by different monitoring methods (a), and the temporal distribution of annual records of CyanoHABs in China (b).

Figure 3 .
Figure 3. Spatial distribution of the frequency of CyanoHABs in China.The size of the circle symbol represents the size of the water body area.The base map is a digital elevation model (DEM, GDEM 90M), which was downloaded from the Geospatial Data Cloud (http://www.gscloud.cn/).

Figure 4 .
Figure 4. Provincial spatial distribution map of water bodies with CyanoHABs in China.The total number of water bodies with CyanoHABs in China in each province (a) and the ratio of lakes and reservoirs with CyanoHABs (> 1 km 2 ) to the total number of lakes and reservoirs in China in each province (b).

Figure 5 .
Figure 5. Temporal variation trend in the number of water bodies with CyanoHABs in China.The spatial distribution of the first year of CyanoHABs in China (a); the temporal variation in the number of water bodies with the first year of CyanoHABs in China and the five lake regions (b); and the temporal variation in the total number of water bodies with CyanoHABs in China and the five lake regions (c).

Figure 6 .
Figure 6.Results of the spatial distribution comparison between the remote sensing dataset (a) and on-site visual interpretation dataset (b).The number of CyanoWaters on both sides of the Hu Line based on the remote sensing, on-site visual, and China-CyanoDB datasets (c).

Figure 7 .
Figure 7. Temporal variation in the total number of CyanoWaters in China for the three datasets.

Table 1 .
The results of collection, cleaning, validation and the proportion of retained records after the processing of CyanoHABs in China's inland waters obtained from keywords search in literature and media.

Table 2 .
The table structure of the metadata table.

Table 3 .
The table structure of the CyanoHABs information table.
Text Date of CyanoHABs, if there are multiple dates in a record, select the date with the largest area per year, one line of record per year Area of CyanoHABs Text Record the maximum area of CyanoHABs, if there are multiple times in a year, record only the maximum area (km 2 )

Table 4 .
Statistical results of the number, frequency of CyanoHABs in China, and area of water bodies monitored by three methods.Monitoring MethodsNumber of water bodies Average of frequency (year) Median of area (km 2 )