An investigation into dense material segmentation

The dense material segmentation task aims at recognising the material for every pixel in daily images. It is beneficial to applications such as robot manipulation and spatial audio synthesis. However, achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in the appearance of a material. This research aims to design high-performance material segmentation networks that can achieve an accuracy above 80% and serve real-time inference. In this thesis, three and a half contributions will be introduced and analysed to accomplish the research objective.

The proposed networks extend the idea of combining material and contextual features for material segmentation. Material features describing transparency and texture can generalise to unseen images regardless of material appearances such as shape and colour. Contextual features can reduce the segmentation uncertainty by providing extra global or semi-global information about the image, such as the scene and object categories.

Contribution A investigates the possibility to leverage contextual features without extra labels. In particular, the boundaries between different materials are selected as semi-global contextual information. A self-training approach is adopted to fill in the unlabelled pixels in the sparsely labelled datasets, and a hybrid network named Context-Aware Material Segmentation Network (CAM-SegNet) is introduced to extract and combine the boundary and material features.

Contribution B.1 explores the way to extract material features from cross-resolution image patches which takes the variation in pixel area covered by each material into account. The Dynamic Backward Attention Transformer (DBAT) is proposed to explicitly gather the intermediate features extracted from cross-resolution patches and merge them dynamically with predicted attention masks.

Contribution B.2 studies the features that networks learn to make predictions. By analysing the cross-resolution features and the attention weights, this study interprets how the DBAT learns from image patches. The features are further aligned to semantic labels by performing network dissection, which emphasises that the proposed model can extract material-related features better than other methods.

Contribution C proposes to segment materials with recovered hyperspectral images which theoretically offer distinct information for material identification, as variations in the intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. The proposed Material Hyperspectral Network (MatSpectNet) leverages the principles of colour perception in modern cameras to regularise the reconstructed hyperspectral images and employs the domain adaptation method to generalise the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception (such as roughness) to learn reliable material features.

The proposed networks are evaluated quantitatively and qualitatively using two open-access material segmentation datasets. CAM-SegNet demonstrates strong discriminative ability when trained with material boundaries, enabling it to accurately identify materials with similar appearances. With cross-resolution patch features, DBAT can accurately segment materials with varying shapes. It has also been demonstrated to extract material-related features more proficiently than other networks. The MatSpecNet, embraced with the recovered hyperspectral images, yields the best performance (88.24% in the averaged per-pixel accuracy), and excels at identifying the material under different illumination conditions, particularly with the presence of spotlight reflection.

Material Segmentation, Deep Learning, Scene understanding, Immersive Sound Rendering

University of Southampton

Heng, Yuwen

a3edf9da-2d3b-450c-8d6d-85f76c861849

2023

Heng, Yuwen

a3edf9da-2d3b-450c-8d6d-85f76c861849

Kim, Hansung

2c7c135c-f00b-4409-acb2-85b3a9e8225f

Dasmahapatra, Srinandan

eb5fd76f-4335-4ae9-a88a-20b9e2b3f698

Heng, Yuwen (2023) An investigation into dense material segmentation. University of Southampton, Doctoral Thesis, 153pp.

Record type: Thesis (Doctoral)

Abstract

Text

Final_thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (43MB)

Text

Final-thesis-submission-Examination-Mr-Yuwen-Heng

Restricted to Repository staff only

More information

Published date: 2023

Related URLs:

Keywords: Material Segmentation, Deep Learning, Scene understanding, Immersive Sound Rendering

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 481919

URI: http://eprints.soton.ac.uk/id/eprint/481919

PURE UUID: 30273190-677c-4e49-8ba0-a9b81544f52e

ORCID for Yuwen Heng:

orcid.org/0000-0003-3793-4811

ORCID for Hansung Kim:

orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 13 Sep 2023 17:07

Last modified: 18 Mar 2024 03:56

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Yuwen Heng

Thesis advisor: Hansung Kim

Thesis advisor: Srinandan Dasmahapatra

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information