A comparative study of variable selection procedures applied in high dimensional medical problems
A comparative study of variable selection procedures applied in high dimensional medical problems
In health studies, many potential factors are usually introduced to determine an outcome variable. In our study, different statistical methods are applied to analyze trauma annual data, collected by 30 General Hospitals in Greece. The first dataset consists of 1681 observations and 76 factors and the second of 6334 observations and 131 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work were the nonconcave penalized likelihood methods, SCAD, LASSO, and Hard, the generalized linear logistic regression, and the best subset variable selection, used to detect possible risk factors of death. A variety of different statistical models are considered, with respect to the combinations of factors and the number of observations. A comparative survey reveals differences between results and execution times of each method, and the analysis produces models that identify the significant prognostic factors affecting death from trauma.
variable selection, generalized linear model, nonconcave penalized likelihood, high-dimensional dataset, trauma
195-209
Koukouvinos, C.
3c626a53-575f-4c62-9b9a-a949f717764b
Mylona, K.
b44af287-2d9f-4df8-931c-32d8ab117864
Vonta, F.
53996f62-0eee-4f2b-ac9a-fbf0a31d006d
2008
Koukouvinos, C.
3c626a53-575f-4c62-9b9a-a949f717764b
Mylona, K.
b44af287-2d9f-4df8-931c-32d8ab117864
Vonta, F.
53996f62-0eee-4f2b-ac9a-fbf0a31d006d
Koukouvinos, C., Mylona, K. and Vonta, F.
(2008)
A comparative study of variable selection procedures applied in high dimensional medical problems.
Journal of Applied Probability & Statistics, 3 (2), .
Abstract
In health studies, many potential factors are usually introduced to determine an outcome variable. In our study, different statistical methods are applied to analyze trauma annual data, collected by 30 General Hospitals in Greece. The first dataset consists of 1681 observations and 76 factors and the second of 6334 observations and 131 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work were the nonconcave penalized likelihood methods, SCAD, LASSO, and Hard, the generalized linear logistic regression, and the best subset variable selection, used to detect possible risk factors of death. A variety of different statistical models are considered, with respect to the combinations of factors and the number of observations. A comparative survey reveals differences between results and execution times of each method, and the analysis produces models that identify the significant prognostic factors affecting death from trauma.
This record has no associated files available for download.
More information
Published date: 2008
Keywords:
variable selection, generalized linear model, nonconcave penalized likelihood, high-dimensional dataset, trauma
Organisations:
Statistics
Identifiers
Local EPrints ID: 336713
URI: http://eprints.soton.ac.uk/id/eprint/336713
ISSN: 1930-6792
PURE UUID: 50afe121-f877-4bee-b64e-6067051fc2db
Catalogue record
Date deposited: 04 Apr 2012 14:11
Last modified: 11 Dec 2021 00:04
Export record
Contributors
Author:
C. Koukouvinos
Author:
K. Mylona
Author:
F. Vonta
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics