The University of Southampton
University of Southampton Institutional Repository

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods
Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods

Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SBS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce = 0.12 and AUC = 0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3 km buffer, ii) livestock farms rating by manure production within a 5 km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.

Embedded methods, Feature selection, Groundwater, Machine learning algorithms, Nitrates, Wrapper methods
0048-9697
661-672
Rodriguez-Galiano, V. F.
44144f72-19cd-433e-be40-36a054d8fbf3
Luque-Espinar, J. A.
f466a352-0583-4b02-8e84-76ab0ee0b155
Chica-Olmo, M.
c7291c15-3b53-45d7-942c-06985f77d6f6
Mendes, M. P.
2ed2c148-7e6c-43ef-8cdd-ea668ed3a524
Rodriguez-Galiano, V. F.
44144f72-19cd-433e-be40-36a054d8fbf3
Luque-Espinar, J. A.
f466a352-0583-4b02-8e84-76ab0ee0b155
Chica-Olmo, M.
c7291c15-3b53-45d7-942c-06985f77d6f6
Mendes, M. P.
2ed2c148-7e6c-43ef-8cdd-ea668ed3a524

Rodriguez-Galiano, V. F., Luque-Espinar, J. A., Chica-Olmo, M. and Mendes, M. P. (2018) Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods Science of Total Environment, 624, pp. 661-672. (doi:10.1016/j.scitotenv.2017.12.152).

Record type: Article

Abstract

Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SBS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce = 0.12 and AUC = 0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3 km buffer, ii) livestock farms rating by manure production within a 5 km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.

Text Rodriguez-Galiano et al 2018 - Accepted Manuscript
Restricted to Repository staff only until 27 December 2019.

More information

Accepted/In Press date: 13 December 2017
e-pub ahead of print date: 27 December 2017
Published date: 15 May 2018
Keywords: Embedded methods, Feature selection, Groundwater, Machine learning algorithms, Nitrates, Wrapper methods

Identifiers

Local EPrints ID: 417339
URI: https://eprints.soton.ac.uk/id/eprint/417339
ISSN: 0048-9697
PURE UUID: ab8f374b-cb48-4132-85b9-aa39015627b7

Catalogue record

Date deposited: 30 Jan 2018 17:30
Last modified: 14 Feb 2018 17:31

Export record

Altmetrics

Contributors

Author: V. F. Rodriguez-Galiano
Author: J. A. Luque-Espinar
Author: M. Chica-Olmo
Author: M. P. Mendes

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff edit
Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×