Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference
Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference
Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10 m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.
Multimodal feature learning, habitat classification, location-based regularisation, seafloor mapping, self-supervision
Liang, Cailei
f9a26dcf-539b-42c0-8b54-e266c89cf6ea
Cappelletto, Jose De La Cruz
a6620d58-0abe-4f9d-9fd9-9ac474de9230
Massot‐Campos, Miquel
6d2b0c16-899c-4f69-8c8d-9434188a30b8
Bodenmann, Adrian
070a668f-cc2f-402a-844e-cdf207b24f50
Huvenne, Veerle
f22be3e2-708c-491b-b985-a438470fa053
Wardell, Catherine Ann
ebf797ae-291e-4db9-90aa-e686097d8c18
Bett, Brian
61342990-13be-45ae-9f5c-9540114335d9
Newborough, Darryl
a39064ca-a599-452b-b296-b891e1f8bccd
Thornton, Blair
8293beb5-c083-47e3-b5f0-d9c3cee14be9
27 May 2025
Liang, Cailei
f9a26dcf-539b-42c0-8b54-e266c89cf6ea
Cappelletto, Jose De La Cruz
a6620d58-0abe-4f9d-9fd9-9ac474de9230
Massot‐Campos, Miquel
6d2b0c16-899c-4f69-8c8d-9434188a30b8
Bodenmann, Adrian
070a668f-cc2f-402a-844e-cdf207b24f50
Huvenne, Veerle
f22be3e2-708c-491b-b985-a438470fa053
Wardell, Catherine Ann
ebf797ae-291e-4db9-90aa-e686097d8c18
Bett, Brian
61342990-13be-45ae-9f5c-9540114335d9
Newborough, Darryl
a39064ca-a599-452b-b296-b891e1f8bccd
Thornton, Blair
8293beb5-c083-47e3-b5f0-d9c3cee14be9
Liang, Cailei, Cappelletto, Jose De La Cruz, Massot‐Campos, Miquel, Bodenmann, Adrian, Huvenne, Veerle, Wardell, Catherine Ann, Bett, Brian, Newborough, Darryl and Thornton, Blair
(2025)
Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference.
The International Journal of Robotics Research, [02783649251343640].
(doi:10.1177/02783649251343640).
Abstract
Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10 m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.
Text
liang-et-al-2025-self-supervised-learning-with-multimodal-remote-sensed-maps-for-seafloor-visual-class-inference (1)
- Version of Record
More information
Accepted/In Press date: 2025
e-pub ahead of print date: 27 May 2025
Published date: 27 May 2025
Additional Information:
Publisher Copyright:
© The Author(s) 2025.
Keywords:
Multimodal feature learning, habitat classification, location-based regularisation, seafloor mapping, self-supervision
Identifiers
Local EPrints ID: 502301
URI: http://eprints.soton.ac.uk/id/eprint/502301
ISSN: 0278-3649
PURE UUID: 72a6f212-2b38-499e-9a2c-e163d75e26ff
Catalogue record
Date deposited: 20 Jun 2025 16:48
Last modified: 22 Aug 2025 02:36
Export record
Altmetrics
Contributors
Author:
Cailei Liang
Author:
Jose De La Cruz Cappelletto
Author:
Miquel Massot‐Campos
Author:
Veerle Huvenne
Author:
Catherine Ann Wardell
Author:
Brian Bett
Author:
Darryl Newborough
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics