基于机器学习的长江口溶解氧预测模型与评估 |
摘要点击 1774 全文点击 237 投稿时间:2023-12-12 修订日期:2024-03-13 |
查看HTML全文
查看全文 查看/发表评论 下载PDF阅读器 |
中文关键词 长江口 溶解氧(DO) 支持向量回归(SVR) 人工神经网络(ANN) 随机森林(RF) 预测 |
英文关键词 Yangtze Estuary dissolved oxygen(DO) support vector regression(SVR) artificial neural network(ANN) random forest(RF) prediction |
|
中文摘要 |
溶解氧是反映水体自净能力与水环境质量的重要因素,实现对长江口溶解氧的数据驱动预测,对于环境管理具有重要意义.机器学习的高效算法被引入到长江口溶解氧的监测和预测工作中,挑选了位于该区域的6个关键监测站点:徐六泾、南通港、启东港、青龙港、南港和北港.首先探究了长江口溶解氧与其他水质因子间的响应和关系,然后使用改进支持向量机回归、人工神经网络和随机森林这3种模型,对2004~2020年月均水质数据进行模型预测对比分析.随机森林重要性评价说明温度、5日生化需氧量和氨氮这3个水质因子在6个断面的重要性指数均排名靠前,说明这3个水质因子对长江口溶解氧的浓度时空分布影响较大.改进支持向量机回归、人工神经网络和随机森林这3种机器学习模型预测结果中,随机森林模型在6个监测断面的总体平均误差为0.19,改进支持向量机回归和人工神经网络模型分别为0.38和0.47,3种模型均有较高的预测能力.对机器学习模型预测性能进行评价,得到训练集上的整体预测性能排名是RF(R2 = 0.971;RMSE = 0.341 mg·L-1)>PSO-SVR(R2 = 0.884; RMSE = 0.707 mg·L-1)>ANN(R2 = 0.792;RMSE = 0.967 mg·L-1).测试集上的整体预测性能排名为RF(R2 = 0.986;RMSE = 0.165 mg·L-1)>PSO-SVR(R2 = 0.951;RMSE = 0.332 mg·L-1)>ANN(R2 = 0.800;RMSE = 0.633 mg·L-1).因此,RF模型在所有监测断面上均表现出最佳预测能力,无论是在训练集还是测试集上都显示出优异的性能和泛化能力. |
英文摘要 |
Dissolved oxygen (DO) serves as a pivotal indicator, mirroring the intrinsic self-purification capacity of aquatic ecosystems and the overarching quality of the water environment. In the context of the Yangtze Estuary, a crucial hub for biodiversity and economic activities in China, understanding and forecasting levels of DO is instrumental for effective environmental stewardship and management strategies. Considering this, the introduction of sophisticated machine learning algorithms into the monitoring and predictive analytics of dissolved oxygen levels represents an important stride toward leveraging the power of data-driven insights for environmental sustainability. The Yangtze Estuary, characterized by its dynamic and complex hydrological and ecological systems, demands an insightful and nuanced approach to monitoring water quality parameters. To this end, six key monitoring stations were chosen across the estuary, including Xuliujing, Nantong Port, Qidong Port, Qinglong Port, South Port, and North Port, acting as sentinel sites for gauging the health of the water body. Leveraging three cutting-edge modeling techniques—particle swarm optimization-support vector regression (PSO-SVR), artificial neural network (ANN), and random forest (RF)—the research unraveled and forecasted the patterns of dissolved oxygen levels using monthly average water quality data spanning from 2004 to 2020. These models embodied the forefront of machine learning technology, each bringing distinct analytical strengths and perspectives to the table, from the nuanced, non-linear pattern recognition capabilities of ANN to the robustness and interpretability of RF. The meticulous evaluation conducted via the RF model underscored the paramount importance of three water quality variables, namely temperature, five-day biochemical oxygen demand, and ammonia nitrogen, in influencing the spatial-temporal dynamics of dissolved oxygen in the estuary. Comparative analysis of the prediction results yielded by the PSO-SVR, ANN, and RF models illuminated the superior performance of the RF model across the six monitoring stations, with an overall average error margin of 0.19, a testament to its efficacy and reliability. In comparison, the PSO-SVR and ANN models exhibited higher error rates of 0.38 and 0.47, respectively, albeit still contributing valuable insights into the complex dissolved oxygen dynamics in the Yangtze Estuary. The prediction performance of the machine learning models was evaluated, and the overall prediction performance ranking on the training set was RF (R2=0.971; RMSE=0.341 mg·L-1) > PSO-SVR (R2=0.884; RMSE=0.707 mg·L-1) > ANN (R2=0.792; RMSE=0.967 mg·L-1). The overall prediction performance ranking on the test set was RF (R2 = 0.986; RMSE=0.165 mg·L-1) > PSO-SVR (R2=0.951; RMSE=0.332 mg·L-1) > ANN (R2=0.800; RMSE=0.633 mg·L-1). Therefore, the RF model exhibited the best predictive ability on all monitoring sections, showing excellent performance and generalization ability both on the training and the test sets. The PSO-SVR model also performed well on most monitored profiles, with slightly lower predictive performance than that of the RF model though with better stability and generalization ability. However, the ANN model did not perform as perfectly as the other two models in some monitoring profiles and its network structure or parameters may need to be further optimized to improve the prediction accuracy and stability. |
|
|
|