Joint clustering with correlated variables
Joint clustering with correlated variables
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
Bayesian methods, Dirichlet process, Semiparametric modeling
1-11
Zhang, Hongmei
9f774048-54d6-4321-a252-3887b2c76db0
Zou, Yubo
ea4be2bf-ca10-45bb-b58e-dadc69d8726d
Terry, Will
b5055768-ef5e-4980-b7a1-93fa172da88d
Karmaus, Wilfried
281d0e53-6b5d-4d38-9732-3981b07cd853
Arshad, Hasan
917e246d-2e60-472f-8d30-94b01ef28958
Zhang, Hongmei
9f774048-54d6-4321-a252-3887b2c76db0
Zou, Yubo
ea4be2bf-ca10-45bb-b58e-dadc69d8726d
Terry, Will
b5055768-ef5e-4980-b7a1-93fa172da88d
Karmaus, Wilfried
281d0e53-6b5d-4d38-9732-3981b07cd853
Arshad, Hasan
917e246d-2e60-472f-8d30-94b01ef28958
Zhang, Hongmei, Zou, Yubo, Terry, Will, Karmaus, Wilfried and Arshad, Hasan
(2018)
Joint clustering with correlated variables.
American Statistician, .
(doi:10.1080/00031305.2018.1424033).
Abstract
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
Text
JointClustering AmericanStat Submission rev 3
- Accepted Manuscript
More information
Accepted/In Press date: 27 December 2017
e-pub ahead of print date: 9 July 2018
Keywords:
Bayesian methods, Dirichlet process, Semiparametric modeling
Identifiers
Local EPrints ID: 422813
URI: http://eprints.soton.ac.uk/id/eprint/422813
ISSN: 0003-1305
PURE UUID: 1a7e4973-47f4-4dd4-aed3-3f100b555220
Catalogue record
Date deposited: 06 Aug 2018 16:30
Last modified: 06 Jun 2024 04:06
Export record
Altmetrics
Contributors
Author:
Hongmei Zhang
Author:
Yubo Zou
Author:
Will Terry
Author:
Wilfried Karmaus
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics