The University of Southampton
University of Southampton Institutional Repository

A combination of variable selection and data mining techniques for high-dimensional statistical modelling

A combination of variable selection and data mining techniques for high-dimensional statistical modelling
A combination of variable selection and data mining techniques for high-dimensional statistical modelling
Variable selection is fundamental to statistical modelling in diverse fields of sciences. This paper deals with the problem of high-dimensional statistical modelling through the analysis of seismological data in Greece acquired during the years 1962-2003. The dataset consists of 10,333 observations and 11 factors, used to detect possible risk factors of large earthquakes. In our study, different statistical variable selection techniques are applied, while data mining techniques enable us to discover associations, meaningful patterns and rules. The statistical methods employed in this work were the non-concave penalised likelihood methods, SCAD, LASSO and Hard, the generalised linear logistic regression and the best subset variable selection. The applied data mining methods were three decision trees algorithms, the classification and regression tree (C&RT), the chi-square automatic interaction detection (CHAID) and the C5.0 algorithm. The way of identifying the significant variables in large datasets along with the performance of used techniques are also discussed.
computing and mathematics, information systems and technology, management and business, operational management and marketing, policy and organisational management
1756-7017
154-168
Koukouvinos, Christos
9c88d32d-b519-4d78-a60d-66418cab1926
Mylona, Kalliopi
b44af287-2d9f-4df8-931c-32d8ab117864
Parpoula, Christina
6244ba93-0965-4177-8343-fedac13ac025
Koukouvinos, Christos
9c88d32d-b519-4d78-a60d-66418cab1926
Mylona, Kalliopi
b44af287-2d9f-4df8-931c-32d8ab117864
Parpoula, Christina
6244ba93-0965-4177-8343-fedac13ac025

Koukouvinos, Christos, Mylona, Kalliopi and Parpoula, Christina (2013) A combination of variable selection and data mining techniques for high-dimensional statistical modelling. International Journal of Information and Decision Sciences, 5 (2), 154-168. (doi:10.1504/IJIDS.2013.053799).

Record type: Article

Abstract

Variable selection is fundamental to statistical modelling in diverse fields of sciences. This paper deals with the problem of high-dimensional statistical modelling through the analysis of seismological data in Greece acquired during the years 1962-2003. The dataset consists of 10,333 observations and 11 factors, used to detect possible risk factors of large earthquakes. In our study, different statistical variable selection techniques are applied, while data mining techniques enable us to discover associations, meaningful patterns and rules. The statistical methods employed in this work were the non-concave penalised likelihood methods, SCAD, LASSO and Hard, the generalised linear logistic regression and the best subset variable selection. The applied data mining methods were three decision trees algorithms, the classification and regression tree (C&RT), the chi-square automatic interaction detection (CHAID) and the C5.0 algorithm. The way of identifying the significant variables in large datasets along with the performance of used techniques are also discussed.

This record has no associated files available for download.

More information

Published date: May 2013
Keywords: computing and mathematics, information systems and technology, management and business, operational management and marketing, policy and organisational management
Organisations: Statistics, Statistical Sciences Research Institute

Identifiers

Local EPrints ID: 354149
URI: http://eprints.soton.ac.uk/id/eprint/354149
ISSN: 1756-7017
PURE UUID: b3168248-d3b0-48e2-9f95-2df01aa30e99

Catalogue record

Date deposited: 03 Jul 2013 11:16
Last modified: 14 Mar 2024 14:14

Export record

Altmetrics

Contributors

Author: Christos Koukouvinos
Author: Kalliopi Mylona
Author: Christina Parpoula

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×