The University of Southampton
University of Southampton Institutional Repository

Comparing four methods for decision-tree induction: a case study on the invasive Iberian gudgeon

Comparing four methods for decision-tree induction: a case study on the invasive Iberian gudgeon
Comparing four methods for decision-tree induction: a case study on the invasive Iberian gudgeon
The invasion of freshwater ecosystems is a particularly alarming phenomenon in the Iberian Peninsula. Habitat suitability modelling is a proficient approach to extract knowledge about species ecology and to guide adequate management actions. Decision-trees are an interpretable modelling technique widely used in ecology, able to handle strongly nonlinear relationships with high order interactions and diverse variable types. Decision-trees recursively split the input space into two parts maximising child node homogeneity. This recursive partitioning is typically performed with axis-parallel splits in a top-down fashion. However, recent developments of the R packages oblique.tree, which allows the development of oblique split-based decision-trees, and evtree, which performs globally optimal searches with evolutionary algorithms to do so, seem to outperform the standard axis-parallel top-down algorithms; CART and C5.0. To evaluate their possible use in ecology, the two new partitioning algorithms were compared with the two well-known, standard axis-parallel algorithms. The entire process was performed in R by simultaneously tuning the decision-tree parameters and the variables subset with a genetic algorithm and modelling the presence–absence of the Iberian gudgeon (Gobio lozanoi; Doadrio and Madeira, 2004), an invasive fish species that has spread across the Iberian Peninsula. The accuracy and complexity of the trees, the modelled patterns of mesohabitat selection and the variables importance were compared. None of the new R packages, namely oblique.tree and evtree, outperformed the C5.0 algorithm. They rendered almost the same decision-trees as the CART algorithm, although they were completely interpretable – they performed from four to eight partitions – in comparison with C5.0, which resulted in a more complex structure with 17 partitions. Oblique.tree proved to be affected by prevalence and it does not include the possibility of weighting the observations, which potentially discourage its actual use. Although the use of evtree did not suggest a major improvement compared with the remaining packages, it allowed the development of regression trees which may be informative for additional modelling tasks such as abundance estimation. Looking at the resulting decision-trees, the optimal habitats for the Iberian gudgeon were large pools in lowland river segments with depositional areas and aquatic vegetation present, which typically appeared in the form of scattered macrophytes clumps. Furthermore, Iberian gudgeon seems to avoid habitats characterised by scouring phenomena and limited vegetated cover availability. Accordingly, we can assume that river regulation and artificial impoundment would have favoured the spread of the Iberian gudgeon across the entire peninsula.
1574-9541
22-34
Munoz-Mas, R.
9db0de3b-e7cf-463c-8515-e3ed21e4e632
Fukuda, S.
c27bcd56-854c-406a-a5dd-cadd77c02e82
VEZZA, PAOLO
feba4aab-3d89-4d3e-826d-ca439261a285
Martinez-Capel, F.
8807606a-3389-4f8a-874c-4ca346575e09
Munoz-Mas, R.
9db0de3b-e7cf-463c-8515-e3ed21e4e632
Fukuda, S.
c27bcd56-854c-406a-a5dd-cadd77c02e82
VEZZA, PAOLO
feba4aab-3d89-4d3e-826d-ca439261a285
Martinez-Capel, F.
8807606a-3389-4f8a-874c-4ca346575e09

Munoz-Mas, R., Fukuda, S., VEZZA, PAOLO and Martinez-Capel, F. (2016) Comparing four methods for decision-tree induction: a case study on the invasive Iberian gudgeon. Ecological Informatics, 34, 22-34. (doi:10.1016/j.ecoinf.2016.04.011).

Record type: Article

Abstract

The invasion of freshwater ecosystems is a particularly alarming phenomenon in the Iberian Peninsula. Habitat suitability modelling is a proficient approach to extract knowledge about species ecology and to guide adequate management actions. Decision-trees are an interpretable modelling technique widely used in ecology, able to handle strongly nonlinear relationships with high order interactions and diverse variable types. Decision-trees recursively split the input space into two parts maximising child node homogeneity. This recursive partitioning is typically performed with axis-parallel splits in a top-down fashion. However, recent developments of the R packages oblique.tree, which allows the development of oblique split-based decision-trees, and evtree, which performs globally optimal searches with evolutionary algorithms to do so, seem to outperform the standard axis-parallel top-down algorithms; CART and C5.0. To evaluate their possible use in ecology, the two new partitioning algorithms were compared with the two well-known, standard axis-parallel algorithms. The entire process was performed in R by simultaneously tuning the decision-tree parameters and the variables subset with a genetic algorithm and modelling the presence–absence of the Iberian gudgeon (Gobio lozanoi; Doadrio and Madeira, 2004), an invasive fish species that has spread across the Iberian Peninsula. The accuracy and complexity of the trees, the modelled patterns of mesohabitat selection and the variables importance were compared. None of the new R packages, namely oblique.tree and evtree, outperformed the C5.0 algorithm. They rendered almost the same decision-trees as the CART algorithm, although they were completely interpretable – they performed from four to eight partitions – in comparison with C5.0, which resulted in a more complex structure with 17 partitions. Oblique.tree proved to be affected by prevalence and it does not include the possibility of weighting the observations, which potentially discourage its actual use. Although the use of evtree did not suggest a major improvement compared with the remaining packages, it allowed the development of regression trees which may be informative for additional modelling tasks such as abundance estimation. Looking at the resulting decision-trees, the optimal habitats for the Iberian gudgeon were large pools in lowland river segments with depositional areas and aquatic vegetation present, which typically appeared in the form of scattered macrophytes clumps. Furthermore, Iberian gudgeon seems to avoid habitats characterised by scouring phenomena and limited vegetated cover availability. Accordingly, we can assume that river regulation and artificial impoundment would have favoured the spread of the Iberian gudgeon across the entire peninsula.

This record has no associated files available for download.

More information

Accepted/In Press date: 24 April 2016
e-pub ahead of print date: 29 April 2016
Published date: July 2016
Organisations: Water & Environmental Engineering Group

Identifiers

Local EPrints ID: 403140
URI: http://eprints.soton.ac.uk/id/eprint/403140
ISSN: 1574-9541
PURE UUID: b3398154-3f9e-47a9-a509-1c6e411ec0ef

Catalogue record

Date deposited: 24 Nov 2016 16:15
Last modified: 15 Mar 2024 03:35

Export record

Altmetrics

Contributors

Author: R. Munoz-Mas
Author: S. Fukuda
Author: PAOLO VEZZA
Author: F. Martinez-Capel

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×