The University of Southampton
University of Southampton Institutional Repository

An investigation into dense material segmentation

An investigation into dense material segmentation
An investigation into dense material segmentation
The dense material segmentation task aims at recognising the material for every pixel in daily images. It is beneficial to applications such as robot manipulation and spatial audio synthesis. However, achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in the appearance of a material. This research aims to design high-performance material segmentation networks that can achieve an accuracy above 80% and serve real-time inference. In this thesis, three and a half contributions will be introduced and analysed to accomplish the research objective.

The proposed networks extend the idea of combining material and contextual features for material segmentation. Material features describing transparency and texture can generalise to unseen images regardless of material appearances such as shape and colour. Contextual features can reduce the segmentation uncertainty by providing extra global or semi-global information about the image, such as the scene and object categories.

Contribution A investigates the possibility to leverage contextual features without extra labels. In particular, the boundaries between different materials are selected as semi-global contextual information. A self-training approach is adopted to fill in the unlabelled pixels in the sparsely labelled datasets, and a hybrid network named Context-Aware Material Segmentation Network (CAM-SegNet) is introduced to extract and combine the boundary and material features.

Contribution B.1 explores the way to extract material features from cross-resolution image patches which takes the variation in pixel area covered by each material into account. The Dynamic Backward Attention Transformer (DBAT) is proposed to explicitly gather the intermediate features extracted from cross-resolution patches and merge them dynamically with predicted attention masks.

Contribution B.2 studies the features that networks learn to make predictions. By analysing the cross-resolution features and the attention weights, this study interprets how the DBAT learns from image patches. The features are further aligned to semantic labels by performing network dissection, which emphasises that the proposed model can extract material-related features better than other methods.

Contribution C proposes to segment materials with recovered hyperspectral images which theoretically offer distinct information for material identification, as variations in the intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. The proposed Material Hyperspectral Network (MatSpectNet) leverages the principles of colour perception in modern cameras to regularise the reconstructed hyperspectral images and employs the domain adaptation method to generalise the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception (such as roughness) to learn reliable material features.

The proposed networks are evaluated quantitatively and qualitatively using two open-access material segmentation datasets. CAM-SegNet demonstrates strong discriminative ability when trained with material boundaries, enabling it to accurately identify materials with similar appearances. With cross-resolution patch features, DBAT can accurately segment materials with varying shapes. It has also been demonstrated to extract material-related features more proficiently than other networks. The MatSpecNet, embraced with the recovered hyperspectral images, yields the best performance (88.24% in the averaged per-pixel accuracy), and excels at identifying the material under different illumination conditions, particularly with the presence of spotlight reflection.
Material Segmentation, Deep Learning, Scene understanding, Immersive Sound Rendering
University of Southampton
Heng, Yuwen
a3edf9da-2d3b-450c-8d6d-85f76c861849
Heng, Yuwen
a3edf9da-2d3b-450c-8d6d-85f76c861849
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698

Heng, Yuwen (2023) An investigation into dense material segmentation. University of Southampton, Doctoral Thesis, 153pp.

Record type: Thesis (Doctoral)

Abstract

The dense material segmentation task aims at recognising the material for every pixel in daily images. It is beneficial to applications such as robot manipulation and spatial audio synthesis. However, achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in the appearance of a material. This research aims to design high-performance material segmentation networks that can achieve an accuracy above 80% and serve real-time inference. In this thesis, three and a half contributions will be introduced and analysed to accomplish the research objective.

The proposed networks extend the idea of combining material and contextual features for material segmentation. Material features describing transparency and texture can generalise to unseen images regardless of material appearances such as shape and colour. Contextual features can reduce the segmentation uncertainty by providing extra global or semi-global information about the image, such as the scene and object categories.

Contribution A investigates the possibility to leverage contextual features without extra labels. In particular, the boundaries between different materials are selected as semi-global contextual information. A self-training approach is adopted to fill in the unlabelled pixels in the sparsely labelled datasets, and a hybrid network named Context-Aware Material Segmentation Network (CAM-SegNet) is introduced to extract and combine the boundary and material features.

Contribution B.1 explores the way to extract material features from cross-resolution image patches which takes the variation in pixel area covered by each material into account. The Dynamic Backward Attention Transformer (DBAT) is proposed to explicitly gather the intermediate features extracted from cross-resolution patches and merge them dynamically with predicted attention masks.

Contribution B.2 studies the features that networks learn to make predictions. By analysing the cross-resolution features and the attention weights, this study interprets how the DBAT learns from image patches. The features are further aligned to semantic labels by performing network dissection, which emphasises that the proposed model can extract material-related features better than other methods.

Contribution C proposes to segment materials with recovered hyperspectral images which theoretically offer distinct information for material identification, as variations in the intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. The proposed Material Hyperspectral Network (MatSpectNet) leverages the principles of colour perception in modern cameras to regularise the reconstructed hyperspectral images and employs the domain adaptation method to generalise the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception (such as roughness) to learn reliable material features.

The proposed networks are evaluated quantitatively and qualitatively using two open-access material segmentation datasets. CAM-SegNet demonstrates strong discriminative ability when trained with material boundaries, enabling it to accurately identify materials with similar appearances. With cross-resolution patch features, DBAT can accurately segment materials with varying shapes. It has also been demonstrated to extract material-related features more proficiently than other networks. The MatSpecNet, embraced with the recovered hyperspectral images, yields the best performance (88.24% in the averaged per-pixel accuracy), and excels at identifying the material under different illumination conditions, particularly with the presence of spotlight reflection.

Text
Final_thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (43MB)
Text
Final-thesis-submission-Examination-Mr-Yuwen-Heng
Restricted to Repository staff only

More information

Published date: 2023
Keywords: Material Segmentation, Deep Learning, Scene understanding, Immersive Sound Rendering

Identifiers

Local EPrints ID: 481919
URI: http://eprints.soton.ac.uk/id/eprint/481919
PURE UUID: 30273190-677c-4e49-8ba0-a9b81544f52e
ORCID for Yuwen Heng: ORCID iD orcid.org/0000-0003-3793-4811
ORCID for Hansung Kim: ORCID iD orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 13 Sep 2023 17:07
Last modified: 18 Mar 2024 03:56

Export record

Contributors

Author: Yuwen Heng ORCID iD
Thesis advisor: Hansung Kim ORCID iD
Thesis advisor: Srinandan Dasmahapatra

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×