The effect of sample design on principal component analysis
The effect of sample design on principal component analysis
Most sample surveys are multivariate and many lend themselves to multivariate methods of analysis. The most usual mode of such analysis is a standard statistical package, such as BMDP or SPSS, in which the multivariate analyses are based on the underlying assumption that the data are generated as independent observations from a common probability distribution. This assumption ignores the sample selection procedure involved in the survey, which leads to the following basic questions. What effects can the sample design have on methods of multivariate analysis? How should such effects be taken into account? This article considers the case of principal component analysis and, in particular, the point estimation of the eigenvalues and eigenvectors of a covariance matrix. It is assumed that the selection of the sample depends on the population values of auxiliary variables as, for example, in stratified sampling. The conventional estimators, based on the assumption of simple random sampling, are compared with alternative probability-weighted and maximum likelihood estimators. Under a multivariate normal model, simple expressions are presented for the approximate model bias of the different estimators. The validity of these results is assessed in a simulation study involving a disproportionate stratified design.
789-798
Skinner, C.J.
48081d82-c596-436e-8846-c9d0a1bf158d
Holmes, D.J.
acb9dc00-6021-4eee-8219-2c5032d62ce7
Smith, T.M.F.
63739590-4e81-43b2-b45f-d20d98542a00
September 1986
Skinner, C.J.
48081d82-c596-436e-8846-c9d0a1bf158d
Holmes, D.J.
acb9dc00-6021-4eee-8219-2c5032d62ce7
Smith, T.M.F.
63739590-4e81-43b2-b45f-d20d98542a00
Skinner, C.J., Holmes, D.J. and Smith, T.M.F.
(1986)
The effect of sample design on principal component analysis.
Journal of the American Statistical Association, 81 (395), .
Abstract
Most sample surveys are multivariate and many lend themselves to multivariate methods of analysis. The most usual mode of such analysis is a standard statistical package, such as BMDP or SPSS, in which the multivariate analyses are based on the underlying assumption that the data are generated as independent observations from a common probability distribution. This assumption ignores the sample selection procedure involved in the survey, which leads to the following basic questions. What effects can the sample design have on methods of multivariate analysis? How should such effects be taken into account? This article considers the case of principal component analysis and, in particular, the point estimation of the eigenvalues and eigenvectors of a covariance matrix. It is assumed that the selection of the sample depends on the population values of auxiliary variables as, for example, in stratified sampling. The conventional estimators, based on the assumption of simple random sampling, are compared with alternative probability-weighted and maximum likelihood estimators. Under a multivariate normal model, simple expressions are presented for the approximate model bias of the different estimators. The validity of these results is assessed in a simulation study involving a disproportionate stratified design.
This record has no associated files available for download.
More information
Published date: September 1986
Identifiers
Local EPrints ID: 34225
URI: http://eprints.soton.ac.uk/id/eprint/34225
ISSN: 0162-1459
PURE UUID: a5ced08c-06ff-403a-bc59-080cbd127e77
Catalogue record
Date deposited: 11 Jan 2008
Last modified: 11 Dec 2021 15:23
Export record
Contributors
Author:
C.J. Skinner
Author:
D.J. Holmes
Author:
T.M.F. Smith
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics