Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain)
Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain)
Watershed management decisions need robust methods, which allow an accurate predictive modeling of pollutant occurrences. Random Forest (RF) is a powerful machine learning data driven method that is rarely used in water resources studies, and thus has not been evaluated thoroughly in this field, when compared to more conventional pattern recognition techniques key advantages of RF include: its non-parametric nature; high predictive accuracy; and capability to determine variable importance. This last characteristic can be used to better understand the individual role and the combined effect of explanatory variables in both protecting and exposing groundwater from and to a pollutant.
In this paper, the performance of the RF regression for predictive modeling of nitrate pollution is explored, based on intrinsic and specific vulnerability assessment of the Vega de Granada aquifer. The applicability of this new machine learning technique is demonstrated in an agriculture-dominated area where nitrate concentrations in groundwater can exceed the trigger value of 50 mg/L, at many locations. A comprehensive GIS database of twenty-four parameters related to intrinsic hydrogeologic proprieties, driving forces, remotely sensed variables and physical–chemical variables measured in “situ”, were used as inputs to build different predictive models of nitrate pollution. RF measures of importance were also used to define the most significant predictors of nitrate pollution in groundwater, allowing the establishment of the pollution sources (pressures).
The potential of RF for generating a vulnerability map to nitrate pollution is assessed considering multiple criteria related to variations in the algorithm parameters and the accuracy of the maps. The performance of the RF is also evaluated in comparison to the logistic regression (LR) method using different efficiency measures to ensure their generalization ability. Prediction results show the ability of RF to build accurate models with strong predictive capabilities
189-206
Rodriguez-Galiano, V.F.
1eb6a1dd-f73d-4e90-a9cf-a51f20712c3c
Mendes, M.P.
2ed2c148-7e6c-43ef-8cdd-ea668ed3a524
Garcia-Soldado, M.J.
218214da-02f6-458b-b591-044f935b3f05
Chica-Olmo, M.
c7291c15-3b53-45d7-942c-06985f77d6f6
Ribeiro, L.
7e6f448f-ac1d-4102-a696-5be9054bd301
1 April 2014
Rodriguez-Galiano, V.F.
1eb6a1dd-f73d-4e90-a9cf-a51f20712c3c
Mendes, M.P.
2ed2c148-7e6c-43ef-8cdd-ea668ed3a524
Garcia-Soldado, M.J.
218214da-02f6-458b-b591-044f935b3f05
Chica-Olmo, M.
c7291c15-3b53-45d7-942c-06985f77d6f6
Ribeiro, L.
7e6f448f-ac1d-4102-a696-5be9054bd301
Rodriguez-Galiano, V.F., Mendes, M.P., Garcia-Soldado, M.J., Chica-Olmo, M. and Ribeiro, L.
(2014)
Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain).
Science of the Total Environment, 476-477, .
(doi:10.1016/j.scitotenv.2014.01.001).
Abstract
Watershed management decisions need robust methods, which allow an accurate predictive modeling of pollutant occurrences. Random Forest (RF) is a powerful machine learning data driven method that is rarely used in water resources studies, and thus has not been evaluated thoroughly in this field, when compared to more conventional pattern recognition techniques key advantages of RF include: its non-parametric nature; high predictive accuracy; and capability to determine variable importance. This last characteristic can be used to better understand the individual role and the combined effect of explanatory variables in both protecting and exposing groundwater from and to a pollutant.
In this paper, the performance of the RF regression for predictive modeling of nitrate pollution is explored, based on intrinsic and specific vulnerability assessment of the Vega de Granada aquifer. The applicability of this new machine learning technique is demonstrated in an agriculture-dominated area where nitrate concentrations in groundwater can exceed the trigger value of 50 mg/L, at many locations. A comprehensive GIS database of twenty-four parameters related to intrinsic hydrogeologic proprieties, driving forces, remotely sensed variables and physical–chemical variables measured in “situ”, were used as inputs to build different predictive models of nitrate pollution. RF measures of importance were also used to define the most significant predictors of nitrate pollution in groundwater, allowing the establishment of the pollution sources (pressures).
The potential of RF for generating a vulnerability map to nitrate pollution is assessed considering multiple criteria related to variations in the algorithm parameters and the accuracy of the maps. The performance of the RF is also evaluated in comparison to the logistic regression (LR) method using different efficiency measures to ensure their generalization ability. Prediction results show the ability of RF to build accurate models with strong predictive capabilities
This record has no associated files available for download.
More information
Published date: 1 April 2014
Organisations:
Global Env Change & Earth Observation
Identifiers
Local EPrints ID: 370079
URI: http://eprints.soton.ac.uk/id/eprint/370079
ISSN: 0048-9697
PURE UUID: 2191e30c-a5d8-4f70-8dac-3be7e9b237f0
Catalogue record
Date deposited: 23 Oct 2014 10:52
Last modified: 14 Mar 2024 18:12
Export record
Altmetrics
Contributors
Author:
V.F. Rodriguez-Galiano
Author:
M.P. Mendes
Author:
M.J. Garcia-Soldado
Author:
M. Chica-Olmo
Author:
L. Ribeiro
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics