The University of Southampton
University of Southampton Institutional Repository

The effect of two-stage sampling on the F-statistic

The effect of two-stage sampling on the F-statistic
The effect of two-stage sampling on the F-statistic
The assumption of iid observations that underlies many statistical procedures is called into question when analyzing complex survey data. The population structure-particularly the existence of clusters in two-stage samples that usually exhibit positive intracluster correlation-invalidates the independence assumption. Kish and Frankel (1974) investigated the impact of this fact on regression analysis by using the standard sample-survey-theory framework; Campbell (1977) and Scott and Holt (1982) used the linear model framework. In general, although ordinary least squares (OLS) procedures are unbiased but not fully efficient for estimation of the regression coefficients, serious difficulties can arise when using OLS estimators for second-order terms. Variances of the OLS estimators for the regression coefficients can be larger (sometimes much larger) than the usual OLS variance expression would indicate. Failure to consider this possibility leads to underestimation of variances, with consequences for confidence intervals. This article follows this effect through to the F statistic, because of its importance to hypothesis tests and confidence ellipsoids. Our major aim is to investigate the effect of intracluster correlation on the F statistic. We propose a diagnostic measure identifying when the ordinary F statistic is likely to be affected and give decomposition in terms of the contributions of the individual regressors and their cross-products, based on a similar decomposition for the projection matrix in Appendix A. We establish numerically and theoretically the effectiveness of this measure in understanding the degree of distortion of F by intracluster correlation. The measure leads to a correction for the F test for unknown intracluster correlation. This is a slightly simpler numerical procedure than the generalized least squares (GLS), since it does not require iteration. The correction is shown to perform at least as well as the GLS in a simulation study.
0162-1459
150-159
Wu, C.F.J.
99e8b6a5-f23c-4265-a015-17459910a58c
Holt, D.
9a1da5d8-7310-41b6-ac36-bdf56896acab
Holmes, D.J.
acb9dc00-6021-4eee-8219-2c5032d62ce7
Wu, C.F.J.
99e8b6a5-f23c-4265-a015-17459910a58c
Holt, D.
9a1da5d8-7310-41b6-ac36-bdf56896acab
Holmes, D.J.
acb9dc00-6021-4eee-8219-2c5032d62ce7

Wu, C.F.J., Holt, D. and Holmes, D.J. (1988) The effect of two-stage sampling on the F-statistic. Journal of the American Statistical Association, 83 (401), 150-159.

Record type: Article

Abstract

The assumption of iid observations that underlies many statistical procedures is called into question when analyzing complex survey data. The population structure-particularly the existence of clusters in two-stage samples that usually exhibit positive intracluster correlation-invalidates the independence assumption. Kish and Frankel (1974) investigated the impact of this fact on regression analysis by using the standard sample-survey-theory framework; Campbell (1977) and Scott and Holt (1982) used the linear model framework. In general, although ordinary least squares (OLS) procedures are unbiased but not fully efficient for estimation of the regression coefficients, serious difficulties can arise when using OLS estimators for second-order terms. Variances of the OLS estimators for the regression coefficients can be larger (sometimes much larger) than the usual OLS variance expression would indicate. Failure to consider this possibility leads to underestimation of variances, with consequences for confidence intervals. This article follows this effect through to the F statistic, because of its importance to hypothesis tests and confidence ellipsoids. Our major aim is to investigate the effect of intracluster correlation on the F statistic. We propose a diagnostic measure identifying when the ordinary F statistic is likely to be affected and give decomposition in terms of the contributions of the individual regressors and their cross-products, based on a similar decomposition for the projection matrix in Appendix A. We establish numerically and theoretically the effectiveness of this measure in understanding the degree of distortion of F by intracluster correlation. The measure leads to a correction for the F test for unknown intracluster correlation. This is a slightly simpler numerical procedure than the generalized least squares (GLS), since it does not require iteration. The correction is shown to perform at least as well as the GLS in a simulation study.

This record has no associated files available for download.

More information

Published date: March 1988

Identifiers

Local EPrints ID: 34226
URI: http://eprints.soton.ac.uk/id/eprint/34226
ISSN: 0162-1459
PURE UUID: 69d4ab22-2cd5-492b-8e7a-cda5cc6b4275

Catalogue record

Date deposited: 20 Dec 2007
Last modified: 11 Dec 2021 15:23

Export record

Contributors

Author: C.F.J. Wu
Author: D. Holt
Author: D.J. Holmes

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×