GeoCLR: georeference contrastive learning for efficient seafloor image interpretation

This paper describes Georeference Contrastive Learning of visual Representation
(GeoCLR) for efficient training of deep-learning Convolutional Neural Networks (CNNs). The method leverages georeference information by generating a similar image pair using images taken of nearby locations, and contrasting these with an image pair that is far apart. The underlying assumption is that images gathered within a close distance are more likely to have similar visual appearance, where this can be reasonably satisfied in seafloor robotic imaging applications where image footprints are limited to edge lengths of a few metres and are taken so that they overlap along a vehicle’s trajectory, whereas seafloor substrates and habitats have patch sizes that are far larger. A key advantage of this method is that it is self-supervised and does not require any human input for CNN training. The method is computationally efficient, where results can be generated between dives during multi-day AUV missions using computational resources that would be accessible during most oceanic field trials. We apply GeoCLR to habitat classification on a dataset that consists of ~86k images gathered using an Autonomous Underwater Vehicle (AUV). We demonstrate how the latent representations generated by GeoCLR can be used to efficiently guide human annotation efforts, where the semi-supervised framework improves classification accuracy by an average of 10.2% compared to the state-of-the-art SimCLR using the same CNN and equivalent number of human annotations for training.

10.55417/fr.2022037

1134 - 1155

Yamada, Takaki

81c66c35-0e2b-4342-80fa-cbee6ff9ce5f

Prugel-Bennett, Adam

b107a151-1751-4d8b-b8db-2c395ac4e14e

Pizarro, Oscar

45a78dfb-5f85-4595-bfc8-d1ac46f2e7bb

Williams, Stefan B.

c9477238-5139-4b74-804c-3b9b464f6949

Thornton, Blair

8293beb5-c083-47e3-b5f0-d9c3cee14be9