An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models
An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models
This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors—and associated models—helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.
Input variable selection, Information theory, Data-driven models, Extreme learning machines, Neural networks
18-34
Taormina, R.
0adddd95-ace1-4025-880c-bbb0a79982a7
Galelli, S.
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Karakaya, G.
642830ea-7cab-4b6b-99ff-1eddb539b862
Ahipasaoglu, S. D.
d69f1b80-5c05-4d50-82df-c13b87b02687
November 2016
Taormina, R.
0adddd95-ace1-4025-880c-bbb0a79982a7
Galelli, S.
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Karakaya, G.
642830ea-7cab-4b6b-99ff-1eddb539b862
Ahipasaoglu, S. D.
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, R., Galelli, S., Karakaya, G. and Ahipasaoglu, S. D.
(2016)
An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models.
Journal of Hydrology, 542, .
(doi:10.1016/j.jhydrol.2016.07.045).
Abstract
This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors—and associated models—helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.
This record has no associated files available for download.
More information
e-pub ahead of print date: 30 July 2016
Published date: November 2016
Keywords:
Input variable selection, Information theory, Data-driven models, Extreme learning machines, Neural networks
Identifiers
Local EPrints ID: 443179
URI: http://eprints.soton.ac.uk/id/eprint/443179
ISSN: 0022-1694
PURE UUID: 19cfd113-1ca5-4607-8877-8739a59ed743
Catalogue record
Date deposited: 13 Aug 2020 16:38
Last modified: 17 Mar 2024 04:03
Export record
Altmetrics
Contributors
Author:
R. Taormina
Author:
S. Galelli
Author:
G. Karakaya
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics