The University of Southampton
University of Southampton Institutional Repository

On the impact of Citizen Science-derived data quality on deep learning based classification in marine images

On the impact of Citizen Science-derived data quality on deep learning based classification in marine images
On the impact of Citizen Science-derived data quality on deep learning based classification in marine images

The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: “citizen science” and “machine learning”. In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying “gold standard” annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.

1932-6203
1-16
Langenkämper, Daniel
101fc0f4-902e-4040-a351-5d4069ea4e78
Simon-Lledó, Erik
80f67b3a-44e7-466e-aed5-06b0ba788ca2
Hosking, Brett
f0b38c0e-2ae2-4cab-8e10-e05696dd505d
Jones, Daniel O.B.
44fc07b3-5fb7-4bf5-9cec-78c78022613a
Nattkemper, Tim W.
a6f7cd11-5871-4aa9-b781-049a392de4a6
Langenkämper, Daniel
101fc0f4-902e-4040-a351-5d4069ea4e78
Simon-Lledó, Erik
80f67b3a-44e7-466e-aed5-06b0ba788ca2
Hosking, Brett
f0b38c0e-2ae2-4cab-8e10-e05696dd505d
Jones, Daniel O.B.
44fc07b3-5fb7-4bf5-9cec-78c78022613a
Nattkemper, Tim W.
a6f7cd11-5871-4aa9-b781-049a392de4a6

Langenkämper, Daniel, Simon-Lledó, Erik, Hosking, Brett, Jones, Daniel O.B. and Nattkemper, Tim W. (2019) On the impact of Citizen Science-derived data quality on deep learning based classification in marine images. PLoS ONE, 14 (6), 1-16, [e0218086]. (doi:10.1371/journal.pone.0218086).

Record type: Article

Abstract

The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: “citizen science” and “machine learning”. In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying “gold standard” annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.

Text
journal.pone.0218086 - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 25 May 2019
Published date: 12 June 2019

Identifiers

Local EPrints ID: 432095
URI: http://eprints.soton.ac.uk/id/eprint/432095
ISSN: 1932-6203
PURE UUID: 2ca62425-cb15-4cbc-a104-a8eb7305a8c2

Catalogue record

Date deposited: 02 Jul 2019 16:30
Last modified: 16 Mar 2024 02:37

Export record

Altmetrics

Contributors

Author: Daniel Langenkämper
Author: Erik Simon-Lledó
Author: Brett Hosking
Author: Daniel O.B. Jones
Author: Tim W. Nattkemper

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×