How many specimens make a sufficient training set for automated 3D feature extraction?

Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared to the external structure, due to low contrast between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from 3D images.

deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction

2054-5703

Mulqueeney, James M.

20bf3f65-5f1a-4836-bccd-f8c97c6f61ab

Searle-Barnes, Alex

27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63

Brombacher, Anieke

2a4bbb84-4743-4a36-973b-4ad2bf743154

Sweeney, Marisa

552b5305-6fd5-45eb-89e2-b4f2bacfe9d6

Goswami, Anjali

0b4facf0-77fd-497c-9eef-0b1c53d0d707

Ezard, Thomas H.G.

a143a893-07d0-4673-a2dd-cea2cd7e1374

14 June 2024

Mulqueeney, James M.

20bf3f65-5f1a-4836-bccd-f8c97c6f61ab

Searle-Barnes, Alex

27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63

Brombacher, Anieke

2a4bbb84-4743-4a36-973b-4ad2bf743154

Sweeney, Marisa

552b5305-6fd5-45eb-89e2-b4f2bacfe9d6

Goswami, Anjali

0b4facf0-77fd-497c-9eef-0b1c53d0d707

Ezard, Thomas H.G.

a143a893-07d0-4673-a2dd-cea2cd7e1374

Mulqueeney, James M., Searle-Barnes, Alex, Brombacher, Anieke, Sweeney, Marisa, Goswami, Anjali and Ezard, Thomas H.G. (2024) How many specimens make a sufficient training set for automated 3D feature extraction? Royal Society Open Science.

Record type: Article

Abstract

Text

J_Mulqueeney_RSOS-240113_Manuscript_Editable - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (2MB)

More information

Accepted/In Press date: 26 April 2024

Published date: 14 June 2024

Keywords: deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction

Learn more about Institute for Life Sciences research Learn more about School of Ocean and Earth Science research

Identifiers

Local EPrints ID: 490070

URI: http://eprints.soton.ac.uk/id/eprint/490070

ISSN: 2054-5703

PURE UUID: 09f48b7d-6904-4a9e-9a83-02772d6204f0

ORCID for James M. Mulqueeney:

orcid.org/0000-0003-3502-745X

ORCID for Alex Searle-Barnes:

orcid.org/0000-0003-0389-7717

ORCID for Anieke Brombacher:

orcid.org/0000-0003-2310-047X

ORCID for Thomas H.G. Ezard:

orcid.org/0000-0001-8305-6605

Catalogue record

Date deposited: 14 May 2024 16:40

Last modified: 13 Jul 2024 01:54

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: James M. Mulqueeney

Author: Alex Searle-Barnes

Author: Anieke Brombacher

Author: Marisa Sweeney

Author: Anjali Goswami

Author: Thomas H.G. Ezard

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information