首页  |  本刊简介  |  编委会  |  投稿须知  |  订阅与联系  |  微信  |  出版道德声明  |  Ei收录本刊数据  |  封面
基于特征优选和机器学习的塔里木盆地东缘绿洲土壤镉元素含量预测及健康风险评价
摘要点击 1728  全文点击 375  投稿时间:2023-08-01  修订日期:2023-11-24
查看HTML全文 查看全文  查看/发表评论  下载PDF阅读器
中文关键词  镉(Cd)  含量预测  健康风险评价  机器学习  特征优选
英文关键词  cadmium (Cd)  content prediction  health risk assessment  machine learning  feature optimization
作者单位E-mail
刘靖宇 长安大学地球科学与资源学院, 西安 710054
中国地质调查局乌鲁木齐自然资源综合调查中心, 乌鲁木齐 830057 
liujingyu@mail.cgs.gov.cn 
李若怡 中国自然资源航空物探遥感中心, 北京 100083  
梁永春 长安大学地球科学与资源学院, 西安 710054  
刘磊 长安大学地球科学与资源学院, 西安 710054 liul@chd.edu.cn 
尹芳 长安大学土地工程学院, 西安 710054  
唐塑 中国地质调查局乌鲁木齐自然资源综合调查中心, 乌鲁木齐 830057  
何林森 长安大学土地工程学院, 西安 710054  
张毅 中国地质调查局西安矿产资源调查中心, 西安 710100  
中文摘要
      土壤重金属污染对粮食安全、人类健康和土壤生态系统均造成重大威胁.基于塔里木盆地东缘典型绿洲区获取的644个土壤样品,运用多元线性回归(LR)、神经网络(BP)、随机森林(RF)、支持向量机(SVM)和基于径向基函数神经网络(RBF)方法构建土壤重金属预测模型,利用最优预测结果分析重金属污染的空间分布特征与健康风险.结果表明:①研究区ω(Cd)均值为0.14 mg·kg-1,是新疆土壤背景值的1.17倍,是区内土壤重金属污染的主要因子;区内成人和儿童Cd元素致癌风险系数均小于10-4,对人类无明显的长期健康风险影响. ②对比5种反演模型的预测精度,RF模型验证集R2值为0.763 7,在5种模型中最大;且其RMSE、 MAE和MBE值在5种模型中最小,土壤Cd元素实测值与RF模型的预测值拟合效果最佳.同时,基于RF模型的研究区土壤Cd含量空间分布预测结果与实测样点插值结果具有较好的一致性. ③在土壤Cd元素健康风险预测中,RF模型对成人与儿童的反演精度均优于其他4种模型,预测结果较好;LR模型验证集预测值变化幅度大,预测结果较差.综上,RF模型具有较好的泛化能力和抗过拟合能力,为研究区土壤Cd含量预测和健康风险评价的最优模型.
英文摘要
      Soil heavy metal pollution poses a serious threat to food security, human health, and soil ecosystems. Based on 644 soil samples collected from a typical oasis located at the eastern margin of the Tarim Basin, a series of models, namely, multiple linear regression (LR), neural network (BP), random forest (RF), support vector machine (SVM), and radial basis function (RBF), were built to predict the soil heavy metal content. The optimal prediction result was obtained and utilized to analyze the spatial distribution features of heavy metal contamination and relevant health risks. The outcomes demonstrated that: ① The average Cd content in the study area was 0.14 mg·kg-1, which was 1.17 times the soil background value of Xinjiang, making it the primary factor of soil heavy metal contamination in the area. Additionally, the carcinogenicity risk coefficients of Cd for both adults and children were less than 10-4, indicating that there were no significant long-term health risks for humans in the area. ② The estimation accuracies of the five inversion models were compared, and the validation set of the RF model had an R2 value of 0.763 7, which was the highest among the five models. Additionally, the RMSE, MAE, and MBE of the RF model were the smallest among the five models. Therefore, the predicted values of the RF model were most consistent with the measured values of the soil Cd content. The predicted map of soil Cd distribution derived from the RF model coincided best with the interpolation map. ③ The RF model outperformed the other four models in predicting health risks associated with the soil Cd element for both adults and children, resulting in better prediction results. Comparatively, the predicted values of the LR model in the validation set varied greatly, leading to unreliable results. It was demonstrated that the RF was the best model for predicting soil Cd content and evaluating health risks in the study area, considering its superior generalization capability and anti-overfitting ability.

您是第75751257位访客
主办单位:中国科学院生态环境研究中心 单位地址:北京市海淀区双清路18号
电话:010-62941102 邮编:100085 E-mail: hjkx@rcees.ac.cn
本系统由北京勤云科技发展有限公司设计  京ICP备05002858号-2