How many specimens make a sufficient training set for automated 3D feature extraction?
How many specimens make a sufficient training set for automated 3D feature extraction?
Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared to the external structure, due to low contrast between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from 3D images.
deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction
Mulqueeney, James M.
20bf3f65-5f1a-4836-bccd-f8c97c6f61ab
Searle-Barnes, Alex
27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63
Brombacher, Anieke
2a4bbb84-4743-4a36-973b-4ad2bf743154
Sweeney, Marisa
552b5305-6fd5-45eb-89e2-b4f2bacfe9d6
Goswami, Anjali
0b4facf0-77fd-497c-9eef-0b1c53d0d707
Ezard, Thomas H.G.
a143a893-07d0-4673-a2dd-cea2cd7e1374
14 June 2024
Mulqueeney, James M.
20bf3f65-5f1a-4836-bccd-f8c97c6f61ab
Searle-Barnes, Alex
27cd9e5f-9a76-4d3d-8c88-0d3d0b1fad63
Brombacher, Anieke
2a4bbb84-4743-4a36-973b-4ad2bf743154
Sweeney, Marisa
552b5305-6fd5-45eb-89e2-b4f2bacfe9d6
Goswami, Anjali
0b4facf0-77fd-497c-9eef-0b1c53d0d707
Ezard, Thomas H.G.
a143a893-07d0-4673-a2dd-cea2cd7e1374
Mulqueeney, James M., Searle-Barnes, Alex, Brombacher, Anieke, Sweeney, Marisa, Goswami, Anjali and Ezard, Thomas H.G.
(2024)
How many specimens make a sufficient training set for automated 3D feature extraction?
Royal Society Open Science.
Abstract
Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared to the external structure, due to low contrast between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from 3D images.
Text
J_Mulqueeney_RSOS-240113_Manuscript_Editable
- Accepted Manuscript
More information
Accepted/In Press date: 26 April 2024
Published date: 14 June 2024
Keywords:
deep learning, data augmentation, image segmentation, planktonic foraminifera, feature extraction
Identifiers
Local EPrints ID: 490070
URI: http://eprints.soton.ac.uk/id/eprint/490070
ISSN: 2054-5703
PURE UUID: 09f48b7d-6904-4a9e-9a83-02772d6204f0
Catalogue record
Date deposited: 14 May 2024 16:40
Last modified: 13 Jul 2024 01:54
Export record
Contributors
Author:
Marisa Sweeney
Author:
Anjali Goswami
Author:
Thomas H.G. Ezard
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics