The University of Southampton
University of Southampton Institutional Repository

Outlier detection and robust covariance estimation using mathematical programming

Outlier detection and robust covariance estimation using mathematical programming
Outlier detection and robust covariance estimation using mathematical programming
The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.
covariance matrix estimation, robust statistics, outlier detection, optimization, semi-definite programming, newton–raphson method
1862-5347
301-334
Nguyen, Tri-Dzung
a6aa7081-6bf7-488a-b72f-510328958a8e
Welsch, Roy E.
748a30c0-1343-4b7a-a118-5214cfda1b78
Nguyen, Tri-Dzung
a6aa7081-6bf7-488a-b72f-510328958a8e
Welsch, Roy E.
748a30c0-1343-4b7a-a118-5214cfda1b78

Nguyen, Tri-Dzung and Welsch, Roy E. (2010) Outlier detection and robust covariance estimation using mathematical programming. Advances in Data Analysis and Classification, 4 (4), 301-334. (doi:10.1007/s11634-010-0070-7).

Record type: Article

Abstract

The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.

This record has no associated files available for download.

More information

Published date: 31 July 2010
Keywords: covariance matrix estimation, robust statistics, outlier detection, optimization, semi-definite programming, newton–raphson method
Organisations: Statistics, Operational Research

Identifiers

Local EPrints ID: 181475
URI: http://eprints.soton.ac.uk/id/eprint/181475
ISSN: 1862-5347
PURE UUID: 549a9911-8c2b-4f43-a157-bc94bda0ee54
ORCID for Tri-Dzung Nguyen: ORCID iD orcid.org/0000-0002-4158-9099

Catalogue record

Date deposited: 18 Apr 2011 13:35
Last modified: 14 Mar 2024 02:56

Export record

Altmetrics

Contributors

Author: Tri-Dzung Nguyen ORCID iD
Author: Roy E. Welsch

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×