The University of Southampton
University of Southampton Institutional Repository

How many specimens make a sufficient training set for automated 3D feature extraction?

How many specimens make a sufficient training set for automated 3D feature extraction?
How many specimens make a sufficient training set for automated 3D feature extraction?
Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared to the external structure, due to low contrast between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from 3D images.
deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction
2054-5703
Mulqueeney, James M.
20bf3f65-5f1a-4836-bccd-f8c97c6f61ab
Searle-Barnes, Alex
27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63
Brombacher, Anieke
2a4bbb84-4743-4a36-973b-4ad2bf743154
Sweeney, Marisa
552b5305-6fd5-45eb-89e2-b4f2bacfe9d6
Goswami, Anjali
0b4facf0-77fd-497c-9eef-0b1c53d0d707
Ezard, Thomas H.G.
a143a893-07d0-4673-a2dd-cea2cd7e1374
Mulqueeney, James M.
20bf3f65-5f1a-4836-bccd-f8c97c6f61ab
Searle-Barnes, Alex
27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63
Brombacher, Anieke
2a4bbb84-4743-4a36-973b-4ad2bf743154
Sweeney, Marisa
552b5305-6fd5-45eb-89e2-b4f2bacfe9d6
Goswami, Anjali
0b4facf0-77fd-497c-9eef-0b1c53d0d707
Ezard, Thomas H.G.
a143a893-07d0-4673-a2dd-cea2cd7e1374

Mulqueeney, James M., Searle-Barnes, Alex, Brombacher, Anieke, Sweeney, Marisa, Goswami, Anjali and Ezard, Thomas H.G. (2024) How many specimens make a sufficient training set for automated 3D feature extraction? Royal Society Open Science.

Record type: Article

Abstract

Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared to the external structure, due to low contrast between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from 3D images.

Text
J_Mulqueeney_RSOS-240113_Manuscript_Editable - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (2MB)

More information

Accepted/In Press date: 26 April 2024
Published date: 14 June 2024
Keywords: deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction

Identifiers

Local EPrints ID: 490070
URI: http://eprints.soton.ac.uk/id/eprint/490070
ISSN: 2054-5703
PURE UUID: 09f48b7d-6904-4a9e-9a83-02772d6204f0
ORCID for James M. Mulqueeney: ORCID iD orcid.org/0000-0003-3502-745X
ORCID for Alex Searle-Barnes: ORCID iD orcid.org/0000-0003-0389-7717
ORCID for Anieke Brombacher: ORCID iD orcid.org/0000-0003-2310-047X
ORCID for Thomas H.G. Ezard: ORCID iD orcid.org/0000-0001-8305-6605

Catalogue record

Date deposited: 14 May 2024 16:40
Last modified: 13 Jul 2024 01:54

Export record

Contributors

Author: Marisa Sweeney
Author: Anjali Goswami
Author: Thomas H.G. Ezard ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×