基于机器学习的河湖水质评估的研究进展 |
摘要点击 2058 全文点击 192 投稿时间:2024-06-09 修订日期:2024-08-10 |
查看HTML全文
查看全文 查看/发表评论 下载PDF阅读器 |
中文关键词 机器学习(ML) 水质评价 河流湖泊 时间序列 驱动因子 |
英文关键词 machine learning(ML) water quality assessment rivers and lakes time series driver factors |
DOI 10.13227/j.hjkx.20250604 |
|
中文摘要 |
机器学习(ML)具有深层次的网络结构和强大的拟合能力,可实现不依赖完整物理化学机制的污染物浓度预测. 因此,ML已成为河湖污染早期预警、水质动态评估领域的重要研究工具. 当前ML在河湖水质评估中的应用主要有特定时刻的污染物浓度快速评估、特定时间步长下的未来污染物浓度预测两个方向;预测污染物主要包括:氮磷营养盐、叶绿素a(Chla)、溶解性有机物(DOM)以及农药等有机物. 溶解氧(DO)、水温(WT)和pH是ML预测污染物浓度的高频输入因子. 内外源和水力条件也是河湖水质变化的关键驱动因子,考虑这两种因子可显著提高ML的预测精度. 此外,数据缺失、过拟合以及可解释性不足等问题是制约ML在河湖水质评估领域发展的主要瓶颈;故机制模型-ML耦合和可解释机器学习(XML)等方法成为了现阶段ML研究的主攻方向. 研究结果将为河湖水质快速评估和污染物浓度预测提供重要参考信息. |
英文摘要 |
Machine learning (ML) possesses a deep network structure and powerful fitting capabilities, enabling the prediction of contaminant concentrations without complete physical and chemical mechanisms. Therefore, ML has become an important research tool for pollution early warning and water quality assessment in rivers and lakes. This review aimed to investigate the application scenarios, methodological focus, impact factors, bottlenecks, and future directions of ML in water quality assessment of water ecosystems. A specialized information database was established by searching the keywords “machine learning” , “water quality assessment” , “rivers” , and “lakes” in the Web of Science (WOS) and China National Knowledge Infrastructure (CNKI). There were 309 relevant literatures in this field, and the volume has increased sharply in recent years. The directions and predictive goals of the literature were analyzed by using feature selection and clustering validation techniques. It was found that water quality prediction was the main purpose for machine learning applications in the water ecosystems, which can generally be subdivided into two directions, i.e., a specific time and the time series prediction of water quality. This study further investigated the effects of input factors and ML methods on the prediction accuracy of nutrients, chlorophyll-a (Chla), and organic matter concentrations. The results showed that dissolved oxygen (DO), water temperature (WT), and pH were the top three high-frequency inputs of ML models for predicting pollutant concentrations. Internal and external sources, as well as parameters of hydraulic conditions such as flow, velocity, and water level, were also the core driving factors in ML models. It is suggested that the factors of internal and external sources and hydraulic conditions have great potential to improve the prediction accuracy of the ML model. Additionally, data missing, overfitting, and insufficient interpretability were the dominant limitations for the application of ML in the water quality assessment. Methods such as mechanistic model-ML coupling and interpretable machine learning (XML) have become the main focus of ML research in the current stage of research. The findings provided important reference information for water quality assessment and pollutant concentration prediction. |