Statistical solutions for error and bias in global citizen science datasets
Statistical solutions for error and bias in global citizen science datasets
Networks of citizen scientists (CS) have the potential to observe biodiversity and species distributions at global scales. Yet the adoption of such datasets in conservation science may be hindered by a perception that the data are of low quality. This perception likely stems from the propensity of data generated by CS to contain greater levels of variability (e.g., measurement error) or bias (e.g., spatio-temporal clustering) in comparison to data collected by scientists or instruments. Modern analytical approaches can account for many types of error and bias typical of CS datasets. It is possible to (1) describe how pseudo-replication in sampling influences the overall variability in response data using mixed-effects modeling, (2) integrate data to explicitly model the sampling process and account for bias using a hierarchical modeling framework, and (3) examine the relative influence of many different or related explanatory factors using machine learning tools. Information from these modeling approaches can be used to predict species distributions and to estimate biodiversity. Even so, achieving the full potential from CS projects requires meta-data describing the sampling process, reference data to allow for standardization, and insightful modeling suitable to the question of interest.
Volunteer data, Statistical analysis, Experimental design, Linear models, Additive models, Species distribution models, Biodiversity, Reef life survey
144-154
Bird, Tomas J.
763cc96b-c03e-422f-95cd-dc65c7491448
Bates, Amanda E.
a96e267d-6d22-4232-b7ed-ce4e448a2a34
Lefcheck, Jonathan S.
ad04f6a4-674b-4b6b-b05b-0b15ffd4c07e
Hill, Nicole A.
86adc1be-02ea-4373-8009-524c56816744
Thomson, Russell J.
4f657e83-ba26-4797-a828-29afa04553c6
Edgar, Graham J.
7269051b-fbec-4753-be8c-1bef22e7d4ec
Stuart-Smith, Rick D.
0c540bfd-5366-4a45-9cef-b3b2afa9ac44
Wotherspoon, Simon
b21d6268-d6da-4a7e-96dc-694d18debcea
Krkosek, Martin
34fe494a-f155-4932-9ba0-7be7aef8183e
Stuart-Smith, Jemina F.
79a1ac8f-29f9-4913-9261-d8a190c01656
Pecl, Gretta T.
5c17c711-08b3-4fe2-b0e4-9c43613b7794
Barrett, Neville
b5afb676-4eef-4722-a802-a7588a98779c
Frusher, Stewart
70fc5213-9264-4f42-a368-fde6ff5b10b6
May 2014
Bird, Tomas J.
763cc96b-c03e-422f-95cd-dc65c7491448
Bates, Amanda E.
a96e267d-6d22-4232-b7ed-ce4e448a2a34
Lefcheck, Jonathan S.
ad04f6a4-674b-4b6b-b05b-0b15ffd4c07e
Hill, Nicole A.
86adc1be-02ea-4373-8009-524c56816744
Thomson, Russell J.
4f657e83-ba26-4797-a828-29afa04553c6
Edgar, Graham J.
7269051b-fbec-4753-be8c-1bef22e7d4ec
Stuart-Smith, Rick D.
0c540bfd-5366-4a45-9cef-b3b2afa9ac44
Wotherspoon, Simon
b21d6268-d6da-4a7e-96dc-694d18debcea
Krkosek, Martin
34fe494a-f155-4932-9ba0-7be7aef8183e
Stuart-Smith, Jemina F.
79a1ac8f-29f9-4913-9261-d8a190c01656
Pecl, Gretta T.
5c17c711-08b3-4fe2-b0e4-9c43613b7794
Barrett, Neville
b5afb676-4eef-4722-a802-a7588a98779c
Frusher, Stewart
70fc5213-9264-4f42-a368-fde6ff5b10b6
Bird, Tomas J., Bates, Amanda E., Lefcheck, Jonathan S., Hill, Nicole A., Thomson, Russell J., Edgar, Graham J., Stuart-Smith, Rick D., Wotherspoon, Simon, Krkosek, Martin, Stuart-Smith, Jemina F., Pecl, Gretta T., Barrett, Neville and Frusher, Stewart
(2014)
Statistical solutions for error and bias in global citizen science datasets.
Biological Conservation, 173, .
(doi:10.1016/j.biocon.2013.07.037).
Abstract
Networks of citizen scientists (CS) have the potential to observe biodiversity and species distributions at global scales. Yet the adoption of such datasets in conservation science may be hindered by a perception that the data are of low quality. This perception likely stems from the propensity of data generated by CS to contain greater levels of variability (e.g., measurement error) or bias (e.g., spatio-temporal clustering) in comparison to data collected by scientists or instruments. Modern analytical approaches can account for many types of error and bias typical of CS datasets. It is possible to (1) describe how pseudo-replication in sampling influences the overall variability in response data using mixed-effects modeling, (2) integrate data to explicitly model the sampling process and account for bias using a hierarchical modeling framework, and (3) examine the relative influence of many different or related explanatory factors using machine learning tools. Information from these modeling approaches can be used to predict species distributions and to estimate biodiversity. Even so, achieving the full potential from CS projects requires meta-data describing the sampling process, reference data to allow for standardization, and insightful modeling suitable to the question of interest.
Text
Bird_et_al_2013 Statistical Solutions to error and bias in global citizen science datasets (2).pdf
- Accepted Manuscript
Restricted to Repository staff only
More information
Accepted/In Press date: September 2013
Published date: May 2014
Keywords:
Volunteer data, Statistical analysis, Experimental design, Linear models, Additive models, Species distribution models, Biodiversity, Reef life survey
Organisations:
Ocean and Earth Science
Identifiers
Local EPrints ID: 361224
URI: http://eprints.soton.ac.uk/id/eprint/361224
ISSN: 0006-3207
PURE UUID: 323b7f62-8352-4b89-93bc-2a23efa84884
Catalogue record
Date deposited: 15 Jan 2014 14:12
Last modified: 14 Mar 2024 15:47
Export record
Altmetrics
Contributors
Author:
Tomas J. Bird
Author:
Amanda E. Bates
Author:
Jonathan S. Lefcheck
Author:
Nicole A. Hill
Author:
Russell J. Thomson
Author:
Graham J. Edgar
Author:
Rick D. Stuart-Smith
Author:
Simon Wotherspoon
Author:
Martin Krkosek
Author:
Jemina F. Stuart-Smith
Author:
Gretta T. Pecl
Author:
Neville Barrett
Author:
Stewart Frusher
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics