Strategy and practice for enhancing the quality of monitoring data in small reservoirs at provincial scale. Application to Guangxi province, China

ABSTRACT Since 2021, China has been actively promoting the Small Reservoir Monitoring Facility Project. However, due to varying levels of expertise among the participating contractors and unclear data review standards, the quality of monitoring data has been unsatisfactory. To improve the quality of the monitoring procedure, this study has implemented a series of measures. Many data review methods and rules have been established as well as automated and manual tools. Then, some statistical analysis were created to evaluate data quality and promptly address any issues. At the end, a mechanism for data review and rectification was created. With those measures, the automatic review can improved data quality and accuracy by approximately 60%. A total of over 11.178 million data were reviewed, with 59,000 data corrected, accounting for 0.05% of the total data. The data quality of 20 small reservoirs in Guangxi has significantly improved, providing reliable information to maximize the effectiveness of the existing monitoring facilities. The findings of this study not only contribute to the foundation of reliable data for future, as well as the development of forecasting and warning systems, but also offer valuable experiences for improving data quality in similar monitoring projects at different levels.


Introduction
Water resource management is a crucial issue that has received increasing attention in recent years.Guangxi has a total of 4314 reservoirs and dams, with 4110 of them classified as small reservoirs, accounting for approximately 95% of the total number of reservoirs.Small reservoirs play a particularly significant role in the Guangxi region, becoming indispensable components of the local ecosystems and economy (Hennig et al., 2013;Ma et al., 2023).In complex water resource management systems, these small-scale reservoirs serve as critical nodes in the water supply network, providing support for essential services such as irrigation, flood control and domestic water uses (Best, 2019;Hogeboom et al., 2018).However, the inadequacy in quality and lack of completeness of monitoring data often become obstacles to effective reservoir management (Cheng & Zheng, 2013;B. Li et al., 2020).
High-quality and comprehensive data are undoubtedly essential for water resource management and decision-making (Marty et al., 2021).Accurate and timely data contribute to making informed decisions, to optimising resource allocation and to reducing potential risks (Prakash et al., 2022).Most of the reservoirs in Guangxi were built during the 1950s to 1970s.Due to the limitations of the economic, social and technological conditions at that time, the construction standards of these reservoir projects were relatively low, leading to poor construction quality and thin dam bodies.Additionally, improper long-term operation and management have also contributed to the operation of some reservoirs with deficiencies (Smith, 2012;Y. Wang et al., 2009).Therefore, in 2021, a nationwide project was launched for the construction of monitoring facilities for small reservoirs, covering 4110 small reservoirs in the Guangxi region.The project aims to achieve automated monitoring and safety warning systems for the dams of these reservoirs (Zhou et al., 2023).The construction of the dam safety monitoring system has various advantages in data management and analysis (J.Zhang et al., 2018).However, due to the involvement of numerous manufacturers from different counties in the Guangxi region, there is a disparity in their expertise, particularly regarding the completeness of reservoir basic information and the accuracy of monitoring values.As a result, the data quality of small reservoir monitoring in Guangxi faces a series of challenges (Mao et al., 2019).For example, there are deficiencies in the basic information of reservoirs and monitoring sites, such as missing data and incomplete and inaccurately recorded inflow information.Moreover, the information is not updated in a timely manner.The lack of unified data reception protocols and processing standards has resulted in compromised accuracy and validity of monitoring data.The absence of clear standards for data collection, verification, and other aspects makes it difficult to ensure the accuracy and authenticity of the overall monitoring data.Consequently, the overall data quality of smallscale reservoir monitoring in Guangxi is inadequate, failing to meet the requirements of water resource management decisions and reservoir management (Milivojević et al., 2022).This limitation not only hinders the full utilisation of existing monitoring facilities but also poses significant challenges for the construction of new facilities (Zhong et al., 2018).
After recognising this gap, a comprehensive study was initiated with the aim to improve the quality and completeness of monitoring data for small reservoirs in Guangxi.Drawing upon advanced data analysis techniques and the latest research findings in this field, a methodology combining methods and protocols for data auditing was established (Wei et al., 2021).These methods include systematic data review and correction processes, ensuring the highest standards of accuracy and reliability.In addition to enhancing the existing data, we focused on supplementing and refining essential baseline information, such as reservoir dam and monitoring station characteristics, reservoir capacity curves, and dam seepage pressure monitoring profiles (Milivojević et al., 2022;Tournadre et al., 2016).By improving the quality and completeness of this foundational information, we significantly elevated the overall quality of the monitoring data (Wu et al., 2017;L. Zhang et al., 2021).After a series of protocols were proposed to enhance the quality of monitoring data for small reservoirs in Guangxi, their effectiveness was verified through practical implementation.Through this research, we anticipate the provision of valuable technical support and experience for the future construction of monitoring facilities for small reservoirs in Guangxi.Furthermore, we aim to offer insights for improving the quality of reservoir monitoring data in similar regions, advancing the scientific and precise management of water resources and contributing to water resource conservation and management both domestically and internationally.
This study will contribute to the development of more efficient and systematic monitoring data management (Curran & Smart, 2021).By leveraging advanced information technology and incorporating the effective measures proposed in this research, we envision that the monitoring data management for small reservoirs will become more efficient and streamlined in the future.This approach will pave the way for the creation of data assets tailored to small reservoirs, providing a reliable data foundation for future data sharing, secure analysis, and the development of forecasting and warning systems.

Data sources
Guangxi has successfully constructed safety monitoring systems for more than 3000 small reservoirs, installing sensors such as rain gauges, water level gauges, seepage pressure gauges and seepage flow meters.Through a mobile communication network, the data collected by each sensor is pooled into a centralised monitoring repository, enabling automatic measurement and transmission of key reservoir parameters.As of 31 December 2022, more than 100 million pieces of raw data have been acquired, which constitute the main data source for this study.The data types cover important monitoring values such as reservoir rainfall, reservoir level, dam seepage pressure, seepage volume and dam displacement.In addition, the basic information of the reservoir, site information, measurement point information, infiltration line cross-section data and reservoir capacity curves were manually added to the monitoring matrix to ensure a one-to-one correspondence between the basic information and the monitoring values (the system composition of the monitoring database for small reservoirs in Guangxi is shown in Figure 1).The quality of monitoring data for small reservoirs was improved by reviewing, supplementing and rectifying the original monitoring data and basic information in the monitoring database.

Overall technical route
The technical routes of this study include the following two routes for the review of monitoring data in small reservoirs: (1) Firstly, to address missing parts in the original data, a standardised data entry format was established to facilitate uniform data integration and supplementation, thereby enhancing data integrity.(2) Secondly, data review rules were formulated, and an auditing software tool was developed to conduct both automated and manual data auditing.
Through the implementation of a data rectification and confirmation mechanism, strict control over data statistical analysis standards was enforced, resulting in an overall enhancement of the monitoring data quality.The implementation of the proposed technical routes actively promotes data review, rectification and supplementation.The technical route is shown in Figure 2.

Develop data review methods and rules
In the domain of water resource management, the quality and integrity of monitoring data are of paramount importance (Guo et al., 2022;L. Wang et al., 2021).In light of this, our study aims to improve the quality of monitoring data for small reservoirs in Guangxi by implementing systematic data review methods and protocols (Yue et al., 2018).Drawing on advanced data analysis techniques and the latest research achievements in this field, we have established a comprehensive data auditing process, which includes the following steps: • Based on the monitoring data from automatic review, the integrity and accuracy of the data are ensured.Firstly, filtering conditions for fields and attributes are established to automatically review the basic information of reservoirs, monitoring stations, and measurement points, including the length, type, and nullity of measurement fields.Secondly, critical measurement values are set with threshold or range conditions to identify and filter out outliers or unreasonable abnormal values.These values are then analysed and organised, retaining the true values and removing erroneous data to ensure the accuracy of the monitoring data.According to the aforementioned review rules, this study has developed a monitoring data automatic review software program to achieve automatic review of monitoring data across the entire region and generate review analysis reports.The software includes essential functionalities such as statistical and analytical tools for ensuring data integrity, analysis of abnormal measurement data, statistics on measurement point accessibility rates, ranking of online measurement points, and sorting of reservoir rainfall for the current day, among others.The main screening rules for automatic review of various measurement values are presented in Table 1.• Based on the results obtained from the automatic screening method (software), further identification, analysis and correction of other unfiltered anomalies in the dam safety monitoring database are conducted through manual review.This includes checking the integrity and reasonableness of data by examining the measurement process lines to ensure the continuity and correlation of measurements.The primary focus is on verifying the correlation between reservoir water levels and rainfall, seepage pressure and seepage flow values to assess the accuracy and reasonableness of the measurements.The deviation between the reservoir water level elevation and the dam crest elevation is examined to determine whether it significantly exceeds a reasonable range, thereby verifying the accuracy of the reservoir water level benchmark and elevation system.The accuracy and reasonableness of seepage pressure values are reviewed by inspecting the dam infiltration line, where downstream seepage pressure values are generally expected to progressively decrease compared to upstream seepage pressure values.The association of the measurement station and point numbers with the corresponding measured values is also audited.The completeness of videos and warning information is checked, along with the normal viewing of videos.Random samples of rainfall and reservoir water level data for specific periods are compared with monitoring data from other specialised systems such as hydrology and flash flood control to determine whether they are consistent or similar.
• The quality of data is assessed using statistical analysis parameters.Through the establishment of key statistical parameters and methods such as the "basic information completeness rate", "data compliance rate", "equipment access rate" and "equipment on-line rate" (as shown in Table 2), real-time statistical analyses using automated auditing software, and the adoption of the ratio method to measure the completeness, accuracy and timeliness of the current data, the quality of monitoring data is assessed.These parameters serve as benchmarks for evaluating the data and Most of the dams in Guangxi's small reservoirs are old dams with basically stable deformation; the automatic monitoring of the displacement adopts the global navigation satellite system (GNSS) method, and its maximum error is 5 mm Taking into account the monitoring error and the actual situation of the dam, the screening threshold for two adjacent measurements of vertical and horizontal displacements of the dam is set at 10 mm provide a clear and objective measure of data quality.
• A comprehensive data quality enhancement strategy is implemented.We utilise a combination of automated audit software and manual review to conduct data auditing and rectification, while concurrently performing data quality assessment, to achieve a cyclic "audit-rectify-assess" pattern for continuous improvement of data quality.This study provided full guidance throughout the data auditing and rectification process in all counties, assisting them in conducting simultaneous auditing and rectification, and verifying rectifications until the data becomes complete and accurate, ensuring the integrity and credibility of monitoring data.
The above data review and rectification methods and rules are designed and implemented to ensure the consistency and effectiveness of data review work.
They not only improve the quality and integrity of monitoring data but also provide a robust and replicable framework for future data review and rectification efforts (F.Li et al., 2013;Shao et al., 2022).This framework can be adapted and applied in other situations, contributing to the broader field of water resource management.

Results and discussion
This study employed standardised data consolidation, analysis and centralised data storage measures to improve data quality.A total of approximately 72,292 reservoir capacity curve data and 1543 dam seepage monitoring cross-section data were collected, organised and imported.The missing and erroneous data in the original database were comprehensively supplemented and improved, providing a reliable, comprehensive and complete foundation for safety monitoring and analysis of small reservoirs in Guangxi.Currently, all basic information and measured values in the monitoring database are available for querying and further analysis.
Particularly, crucial parameters such as reservoir capacity curves and monitoring results of dam seepage are presented coherently and intuitively (as shown in Figure 3).This study employed both automated data review and manual review methods using data review software to comprehensively review and rectify the data in the main database of small-scale reservoir monitoring.Under the technical guidance of the smallscale reservoir monitoring construction project team, an automated auditing software program was used to generate a real-time "Data Review Issue List" and "Data Rectification Confirmation Feedback" for the construction units.By using a combination of software and manual review, the data was audited at the county level according to established review rules, and construction units were required to promptly rectify any data discrepancies.Throughout the entire rectification process, the project team provided supervision, guidance, and repeated review and rectification in each county until the data was complete and accurate.Following the data rectification confirmation procedure, the construction units were mandated to complete data rectification within five working days of receiving feedback, until the rectification rate reached 100%.The project leader then signed and stamped the feedback comments to confirm data quality.For instance, during a particular inspection and review, 90% of the identified issues were resolved after rectification.This data output confirmation mechanism ensured the integrity and credibility of the monitoring data, significantly enhancing the continuity, correlation and accuracy of the measurements.By utilising the automated auditing software, water resource management departments were able to efficiently manage and analyse hydrological monitoring data, providing accurate evidence for decision-making.
Through testing, the application of the automated auditing software in the Guangxi region increased data review efficiency by approximately 60% and effectively reduced the workload of manual reviews.After implementing the strategies for improving the data quality of small-scale reservoir monitoring proposed in this study, as of 31 December 2022, a total of over 11.178 million data entries were reviewed, and over 59,000 data entries were rectified, accounting for 0.05% of the total data volume (refer to Table 3 for statistics on data review and rectification for small-scale reservoir monitoring in Guangxi).This significantly improved data quality.
After comprehensive auditing and rectification of monitoring data from small reservoirs, the completeness of 12 fundamental data items including reservoirs and monitoring stations, as well as the access rate of monitoring equipment, has gradually improved.From September to December 2022, the construction of monitoring facilities for small reservoirs in each project county was essentially completed, and the entry rate of monitoring equipment for reservoirs and stations increased progressively.Over the period September to December 2022, the completeness rate of fundamental data increased from 52.6% before implementing the measures to 100%, and the data qualification rate improved from 92.5% to 100%.With a significant improvement in the completeness and qualification rates of fundamental data, the overall reliability and stability of the entire monitoring system were enhanced.This was reflected in the equipment access rate increasing from 45.7% to 100% and the daily 24-hour equipment online rate rising from 42.2% to 92.3%.Details of the statistical parameter changes in the monitoring data are shown in Figure 4, and the overall effects after implementing effective measures for data auditing and rectification are presented in Table 4.

Conclusion and prospects
This study aims to improve the quality and integrity of monitoring data for small-scale reservoirs by establishing data review methods and protocols to achieve this objective.The research demonstrates significant progress in the management of reservoir monitoring data in Guangxi, leading to profound impacts.Firstly, the implementation of systematic data review methods and protocols has substantially enhanced the quality and completeness of monitoring data.This implies that existing monitoring facilities can more effectively serve their intended purposes, providing more reliable data support for water resource management.Simultaneously, new facilities can learn from this approach during their construction phase, reducing errors in data collection and management processes and improving the accuracy of monitoring data.
Secondly, the research findings offer a robust and replicable data review framework for water resource management.This framework can be applied in other contexts, offering experiences and guidance for a broader scope of water resource management.By demonstrating the effectiveness of data review and rectification methods and protocols, future data review and rectification efforts can be implemented more efficiently, ensuring continuous improvement in data quality.
Looking ahead, with the progress of information technology, we can anticipate that monitoring data management will become more efficient and streamlined.By adopting the effective measures proposed in this study, monitoring data management will move towards intelligent and systematic approaches, thereby enhancing the efficiency and accuracy of data processing.
In conclusion, this study emphasises the significance of rigorous data review and rectification in enhancing the quality and integrity of monitoring data.Its findings contribute to the formulation of more effective and efficient water resource management strategies, paving the way for the data assets of small-scale reservoirs.This will provide a reliable data foundation for future data sharing and secure analysis, as well as the development of forecasting and warning systems for small-scale reservoirs.Through sustained efforts, water resource management will become more scientific and intelligent, contributing to the effective utilisation and protection of water resources.

Figure 1 .
Figure1.System composition of the monitoring database for small reservoirs in Guangxi.

Figure 2 .
Figure 2. Roadmap of data audit technology for small reservoirs.

Figure 3 .
Figure 3. Results of (a) capacity curve and (b) infiltration line for small reservoirs.

Table 1 .
Main screening rules for automatic verification of measurement data.

Table 2 .
Types of data quality assessment parameters.
C EA ¼ P PTotal � 100% C DA is the device access rate; P is the number of accessible points; P Total is the total number of points Equipment online rate The equipment online rate indicates the operational status of the monitoring station equipment and the integrity of the data transmission.Stations that have reported data within the last 24 hours are considered to be online.C EO ¼ S STotal � 100% C EO is the equipment online rate; S is the online monitoring station; S Total is the total monitoring station

Table 3 .
Statistical table for review and rectification of monitoring data of small reservoirs.