Population size estimation based upon ratios of recapture probabilities
Population size estimation based upon ratios of recapture probabilities
Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency distribution of counts of repeated identifications of units of the population of interest. This counting variable is necessarily zero-truncated, since units that have never been identified are not in the sample. We consider several applications: clinical medicine, where interest is in estimating patients with adenomatous polyps which have been overlooked by the diagnostic procedure; drug user studies, where interest is in estimating the number of hidden drug users which are not identified; veterinary surveillance of scrapie in the UK, where interest is in estimating the hidden amount of scrapie; and entomology and microbial ecology, where interest is in estimating the number of unobserved species of organisms. In all these examples, simple models such as the homogenous Poisson are not appropriate since they do not account for present and latent heterogeneity. The Poisson–Gamma (negative binomial) model provides a flexible alternative and often leads to well-fitting models. It has a long history and was recently used in the development of the Chao–Bunge estimator. Here we use a different property of the Poisson–Gamma model: if we consider ratios of neighboring Poisson–Gamma probabilities, then these are linearly related to the counts of repeated identifications. Also, ratios have the useful property that they are identical for truncated and untruncated distributions. In this paper we propose a weighted logarithmic regression model to estimate the zero frequency counts, assuming a Gamma–Poisson distribution for the counts. A detailed explanation about the chosen weights and a goodness of fit index are presented, along with extensions to other distributions. To evaluate the proposed estimator, we applied it to the benchmark examples mentioned above, and we compared the results with those obtained through the Chao–Bunge and other estimators. The major benefits of the proposed estimator are that it is defined under mild conditions, whereas the Chao–Bunge estimator fails to be well defined in several of the examples presented; in cases where the Chao–Bunge estimator is defined, its behavior is comparable to the proposed estimator in terms of Bias and MSE as a simulation study shows. Furthermore, the proposed estimator is relatively insensitive to inclusion or exclusion of large outlying frequencies, while sensitivity to outliers is characteristic of most other methods. The implications and limitations of such methods are discussed.
1512-1533
Rocchetti, Irene
860f3ca0-8363-4fb4-b306-9a70e74bb663
Bunge, John
6d2e583a-a816-4604-9e9a-8c41ebcbf12b
Böhning, Dankmar
1df635d4-e3dc-44d0-b61d-5fd11f6434e1
2011
Rocchetti, Irene
860f3ca0-8363-4fb4-b306-9a70e74bb663
Bunge, John
6d2e583a-a816-4604-9e9a-8c41ebcbf12b
Böhning, Dankmar
1df635d4-e3dc-44d0-b61d-5fd11f6434e1
Rocchetti, Irene, Bunge, John and Böhning, Dankmar
(2011)
Population size estimation based upon ratios of recapture probabilities.
The Annals of Applied Statistics, 5 (2B), .
(doi:10.1214/10-AOAS436).
Abstract
Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency distribution of counts of repeated identifications of units of the population of interest. This counting variable is necessarily zero-truncated, since units that have never been identified are not in the sample. We consider several applications: clinical medicine, where interest is in estimating patients with adenomatous polyps which have been overlooked by the diagnostic procedure; drug user studies, where interest is in estimating the number of hidden drug users which are not identified; veterinary surveillance of scrapie in the UK, where interest is in estimating the hidden amount of scrapie; and entomology and microbial ecology, where interest is in estimating the number of unobserved species of organisms. In all these examples, simple models such as the homogenous Poisson are not appropriate since they do not account for present and latent heterogeneity. The Poisson–Gamma (negative binomial) model provides a flexible alternative and often leads to well-fitting models. It has a long history and was recently used in the development of the Chao–Bunge estimator. Here we use a different property of the Poisson–Gamma model: if we consider ratios of neighboring Poisson–Gamma probabilities, then these are linearly related to the counts of repeated identifications. Also, ratios have the useful property that they are identical for truncated and untruncated distributions. In this paper we propose a weighted logarithmic regression model to estimate the zero frequency counts, assuming a Gamma–Poisson distribution for the counts. A detailed explanation about the chosen weights and a goodness of fit index are presented, along with extensions to other distributions. To evaluate the proposed estimator, we applied it to the benchmark examples mentioned above, and we compared the results with those obtained through the Chao–Bunge and other estimators. The major benefits of the proposed estimator are that it is defined under mild conditions, whereas the Chao–Bunge estimator fails to be well defined in several of the examples presented; in cases where the Chao–Bunge estimator is defined, its behavior is comparable to the proposed estimator in terms of Bias and MSE as a simulation study shows. Furthermore, the proposed estimator is relatively insensitive to inclusion or exclusion of large outlying frequencies, while sensitivity to outliers is characteristic of most other methods. The implications and limitations of such methods are discussed.
This record has no associated files available for download.
More information
Published date: 2011
Organisations:
Statistical Sciences Research Institute
Identifiers
Local EPrints ID: 210467
URI: http://eprints.soton.ac.uk/id/eprint/210467
ISSN: 1932-6157
PURE UUID: d2949397-fb7d-4f70-91fd-f4df1f3cbbd4
Catalogue record
Date deposited: 09 Feb 2012 13:25
Last modified: 15 Mar 2024 03:39
Export record
Altmetrics
Contributors
Author:
Irene Rocchetti
Author:
John Bunge
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics