Learning algorithm selection for comprehensible regression analysis using datasetoids
Learning algorithm selection for comprehensible regression analysis using datasetoids
Data mining tools often include a workbench of algorithms to model a given dataset but lack sufficient guidance to select the most accurate algorithm given a certain dataset. The best algorithm is not known in advance and no single model format is superior for all datasets. Evaluating a number of candidate algorithms on large datasets to determine the most accurate model is however a computational burden. An alternative and more time efficient way is to select the optimal algorithm based on the nature of the dataset. In this meta-learning study, it is explored to what degree dataset characteristics can help identify which regression/estimation algorithm will best fit a given dataset. We chose to focus on comprehensible `white-box' techniques in particular (i.e. linear, spline, tree, linear tree or spline tree) as those are of particular interest in many real-life estimation settings. A large scale experiment with more than thousand so called datasetoids representing various real-life dependencies is conducted to discover possible relations. It is found that algorithm based characteristics such as sampling landmarks are major drivers for successfully selecting the most accurate algorithm. Further, it is found that data based characteristics such as the length, dimensionality and composition of the independent variables, or the asymmetry and dispersion of the dependent variable appear to contribute little once landmarks are included in the meta-model.
data mining, regression, comprehensibility, meta learning, datasetoid
1019-1034
Loterman, G.
87bb0c85-8b43-49bf-92d7-4d894d4bb3aa
Mues, C.
07438e46-bad6-48ba-8f56-f945bc2ff934
8 September 2015
Loterman, G.
87bb0c85-8b43-49bf-92d7-4d894d4bb3aa
Mues, C.
07438e46-bad6-48ba-8f56-f945bc2ff934
Loterman, G. and Mues, C.
(2015)
Learning algorithm selection for comprehensible regression analysis using datasetoids.
Intelligent Data Analysis, 19 (5), .
(doi:10.3233/IDA-150756).
Abstract
Data mining tools often include a workbench of algorithms to model a given dataset but lack sufficient guidance to select the most accurate algorithm given a certain dataset. The best algorithm is not known in advance and no single model format is superior for all datasets. Evaluating a number of candidate algorithms on large datasets to determine the most accurate model is however a computational burden. An alternative and more time efficient way is to select the optimal algorithm based on the nature of the dataset. In this meta-learning study, it is explored to what degree dataset characteristics can help identify which regression/estimation algorithm will best fit a given dataset. We chose to focus on comprehensible `white-box' techniques in particular (i.e. linear, spline, tree, linear tree or spline tree) as those are of particular interest in many real-life estimation settings. A large scale experiment with more than thousand so called datasetoids representing various real-life dependencies is conducted to discover possible relations. It is found that algorithm based characteristics such as sampling landmarks are major drivers for successfully selecting the most accurate algorithm. Further, it is found that data based characteristics such as the length, dimensionality and composition of the independent variables, or the asymmetry and dispersion of the dependent variable appear to contribute little once landmarks are included in the meta-model.
This record has no associated files available for download.
More information
Published date: 8 September 2015
Keywords:
data mining, regression, comprehensibility, meta learning, datasetoid
Organisations:
Southampton Business School
Identifiers
Local EPrints ID: 386764
URI: http://eprints.soton.ac.uk/id/eprint/386764
ISSN: 1088-467x
PURE UUID: 9c967003-179e-4009-a6a9-4e4bc9986b00
Catalogue record
Date deposited: 04 Feb 2016 09:49
Last modified: 15 Mar 2024 03:20
Export record
Altmetrics
Contributors
Author:
G. Loterman
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics