The University of Southampton
University of Southampton Institutional Repository

Clustering multivariate longitudinal observations: The contaminated Gaussian hidden Markov model

Clustering multivariate longitudinal observations: The contaminated Gaussian hidden Markov model
Clustering multivariate longitudinal observations: The contaminated Gaussian hidden Markov model
The Gaussian hidden Markov model (HMM) is widely considered for the analysis of heterogeneous continuous multivariate longitudinal data. To robustify this approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of outliers, spurious points, or noise (collectively referred to as bad points herein), the contaminated Gaussian HMM is here introduced. The contaminated Gaussian distribution represents an elliptical generalization of the Gaussian distribution and allows for automatic detection of bad points in the same natural way as observations are typically assigned to the latent states in the HMM context. Once the model is fitted, each observation has a posterior probability of belonging to a particular state and, inside each state, of being a bad point or not. In addition to the parameters of the classical Gaussian HMM, for each state we have two more parameters, both with a specific and useful interpretation: one controls the proportion of bad points and one specifies their degree of atypicality. A sufficient condition for the identifiability of the model is given, an expectation-conditional maximization algorithm is outlined for parameter estimation and various operational issues are discussed. Using a large scale simulation study, but also an illustrative artificial dataset, we demonstrate the effectiveness of the proposed model in comparison with HMMs of different elliptical distributions, and we also evaluate the performance of some well-known information criteria in selecting the true number of latent states. The model is finally used to fit data on criminal activities in Italian provinces.
robust model-based clustering, expected-conditional maximization (ECM) algorithm, model selection, elliptical distributions, atypical data
1061-8600
1-32
Punzo, Antonio
1138a0c8-cc0b-4f02-8409-957de3bd1fed
Maruotti, Antonello
7096256c-fa1b-4cc1-9ca4-1a60cc3ee12e
Punzo, Antonio
1138a0c8-cc0b-4f02-8409-957de3bd1fed
Maruotti, Antonello
7096256c-fa1b-4cc1-9ca4-1a60cc3ee12e

Punzo, Antonio and Maruotti, Antonello (2015) Clustering multivariate longitudinal observations: The contaminated Gaussian hidden Markov model. Journal of Computational and Graphical Statistics, 1-32. (doi:10.1080/10618600.2015.1089776).

Record type: Article

Abstract

The Gaussian hidden Markov model (HMM) is widely considered for the analysis of heterogeneous continuous multivariate longitudinal data. To robustify this approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of outliers, spurious points, or noise (collectively referred to as bad points herein), the contaminated Gaussian HMM is here introduced. The contaminated Gaussian distribution represents an elliptical generalization of the Gaussian distribution and allows for automatic detection of bad points in the same natural way as observations are typically assigned to the latent states in the HMM context. Once the model is fitted, each observation has a posterior probability of belonging to a particular state and, inside each state, of being a bad point or not. In addition to the parameters of the classical Gaussian HMM, for each state we have two more parameters, both with a specific and useful interpretation: one controls the proportion of bad points and one specifies their degree of atypicality. A sufficient condition for the identifiability of the model is given, an expectation-conditional maximization algorithm is outlined for parameter estimation and various operational issues are discussed. Using a large scale simulation study, but also an illustrative artificial dataset, we demonstrate the effectiveness of the proposed model in comparison with HMMs of different elliptical distributions, and we also evaluate the performance of some well-known information criteria in selecting the true number of latent states. The model is finally used to fit data on criminal activities in Italian provinces.

Text
__userfiles.soton.ac.uk_Library_SLAs_Work_for_ALL's_Work_for_ePrints_Accepted Manuscripts_Punzo_Clustering.pdf - Accepted Manuscript
Download (482kB)

More information

Accepted/In Press date: 14 August 2015
e-pub ahead of print date: 29 September 2015
Keywords: robust model-based clustering, expected-conditional maximization (ECM) algorithm, model selection, elliptical distributions, atypical data
Organisations: Mathematical Sciences

Identifiers

Local EPrints ID: 383292
URI: https://eprints.soton.ac.uk/id/eprint/383292
ISSN: 1061-8600
PURE UUID: b301755f-8f8f-4ff4-88ef-1f042d287c2b

Catalogue record

Date deposited: 23 Oct 2015 14:12
Last modified: 10 Dec 2019 06:38

Export record

Altmetrics

Contributors

Author: Antonio Punzo
Author: Antonello Maruotti

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×