The University of Southampton
University of Southampton Institutional Repository

An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models
An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models
This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors—and associated models—helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.
Input variable selection, Information theory, Data-driven models, Extreme learning machines, Neural networks
0022-1694
18-34
Taormina, R.
0adddd95-ace1-4025-880c-bbb0a79982a7
Galelli, S.
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Karakaya, G.
642830ea-7cab-4b6b-99ff-1eddb539b862
Ahipasaoglu, S. D.
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, R.
0adddd95-ace1-4025-880c-bbb0a79982a7
Galelli, S.
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Karakaya, G.
642830ea-7cab-4b6b-99ff-1eddb539b862
Ahipasaoglu, S. D.
d69f1b80-5c05-4d50-82df-c13b87b02687

Taormina, R., Galelli, S., Karakaya, G. and Ahipasaoglu, S. D. (2016) An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models. Journal of Hydrology, 542, 18-34. (doi:10.1016/j.jhydrol.2016.07.045).

Record type: Article

Abstract

This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors—and associated models—helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.

This record has no associated files available for download.

More information

e-pub ahead of print date: 30 July 2016
Published date: November 2016
Keywords: Input variable selection, Information theory, Data-driven models, Extreme learning machines, Neural networks

Identifiers

Local EPrints ID: 443179
URI: http://eprints.soton.ac.uk/id/eprint/443179
ISSN: 0022-1694
PURE UUID: 19cfd113-1ca5-4607-8877-8739a59ed743
ORCID for S. D. Ahipasaoglu: ORCID iD orcid.org/0000-0003-1371-315X

Catalogue record

Date deposited: 13 Aug 2020 16:38
Last modified: 17 Mar 2024 04:03

Export record

Altmetrics

Contributors

Author: R. Taormina
Author: S. Galelli
Author: G. Karakaya

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×