The University of Southampton
University of Southampton Institutional Repository

Learning algorithm selection for comprehensible regression analysis using datasetoids

Learning algorithm selection for comprehensible regression analysis using datasetoids
Learning algorithm selection for comprehensible regression analysis using datasetoids
Data mining tools often include a workbench of algorithms to model a given dataset but lack sufficient guidance to select the most accurate algorithm given a certain dataset. The best algorithm is not known in advance and no single model format is superior for all datasets. Evaluating a number of candidate algorithms on large datasets to determine the most accurate model is however a computational burden. An alternative and more time efficient way is to select the optimal algorithm based on the nature of the dataset. In this meta-learning study, it is explored to what degree dataset characteristics can help identify which regression/estimation algorithm will best fit a given dataset. We chose to focus on comprehensible `white-box' techniques in particular (i.e. linear, spline, tree, linear tree or spline tree) as those are of particular interest in many real-life estimation settings. A large scale experiment with more than thousand so called datasetoids representing various real-life dependencies is conducted to discover possible relations. It is found that algorithm based characteristics such as sampling landmarks are major drivers for successfully selecting the most accurate algorithm. Further, it is found that data based characteristics such as the length, dimensionality and composition of the independent variables, or the asymmetry and dispersion of the dependent variable appear to contribute little once landmarks are included in the meta-model.
data mining, regression, comprehensibility, meta learning, datasetoid
1088-467x
1019-1034
Loterman, G.
87bb0c85-8b43-49bf-92d7-4d894d4bb3aa
Mues, C.
07438e46-bad6-48ba-8f56-f945bc2ff934
Loterman, G.
87bb0c85-8b43-49bf-92d7-4d894d4bb3aa
Mues, C.
07438e46-bad6-48ba-8f56-f945bc2ff934

Loterman, G. and Mues, C. (2015) Learning algorithm selection for comprehensible regression analysis using datasetoids. Intelligent Data Analysis, 19 (5), 1019-1034. (doi:10.3233/IDA-150756).

Record type: Article

Abstract

Data mining tools often include a workbench of algorithms to model a given dataset but lack sufficient guidance to select the most accurate algorithm given a certain dataset. The best algorithm is not known in advance and no single model format is superior for all datasets. Evaluating a number of candidate algorithms on large datasets to determine the most accurate model is however a computational burden. An alternative and more time efficient way is to select the optimal algorithm based on the nature of the dataset. In this meta-learning study, it is explored to what degree dataset characteristics can help identify which regression/estimation algorithm will best fit a given dataset. We chose to focus on comprehensible `white-box' techniques in particular (i.e. linear, spline, tree, linear tree or spline tree) as those are of particular interest in many real-life estimation settings. A large scale experiment with more than thousand so called datasetoids representing various real-life dependencies is conducted to discover possible relations. It is found that algorithm based characteristics such as sampling landmarks are major drivers for successfully selecting the most accurate algorithm. Further, it is found that data based characteristics such as the length, dimensionality and composition of the independent variables, or the asymmetry and dispersion of the dependent variable appear to contribute little once landmarks are included in the meta-model.

This record has no associated files available for download.

More information

Published date: 8 September 2015
Keywords: data mining, regression, comprehensibility, meta learning, datasetoid
Organisations: Southampton Business School

Identifiers

Local EPrints ID: 386764
URI: http://eprints.soton.ac.uk/id/eprint/386764
ISSN: 1088-467x
PURE UUID: 9c967003-179e-4009-a6a9-4e4bc9986b00
ORCID for C. Mues: ORCID iD orcid.org/0000-0002-6289-5490

Catalogue record

Date deposited: 04 Feb 2016 09:49
Last modified: 15 Mar 2024 03:20

Export record

Altmetrics

Contributors

Author: G. Loterman
Author: C. Mues ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×