Enhancing material features using dynamic backward attention on cross-resolution patches
Enhancing material features using dynamic backward attention on cross-resolution patches
Recent studies in material segmentation crop the image into patches to force the network to learn material features from local visual clues. This design is based on the expectation that the contextually invariant features can generalise the network to unseen images regardless of the object or scene in which the material appears. However, most approaches set a fixed patch resolution for all the images in a dataset, which does not consider the varying areas that materials cover within and across images due to the scene scale. As a consequence, the fixed patch resolution can limit the performance of networks. In consideration of this problem, this paper proposes a Dynamic Backward Attention Transformer (DBAT) to extract features from cross-resolution patches and dynamically aggregate these features based on per-pixel attention masks. Experiments show that DBAT achieves the best performance among state-of-the-art models (86.85% in average pixel accuracy, which is 2.15% higher than the second-best model) that can serve real-time inference. Moreover, we also illustrate the network behaviour through visualisation methods as well as descriptive statistics. The project code is available at https://github.com/heng-yuwen/Dynamic-Backward-Attention-Transformer.
Heng, Yuwen
a3edf9da-2d3b-450c-8d6d-85f76c861849
Wu, Yihong
2876bede-25f1-47a5-9e08-b98be99b2d31
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Heng, Yuwen
a3edf9da-2d3b-450c-8d6d-85f76c861849
Wu, Yihong
2876bede-25f1-47a5-9e08-b98be99b2d31
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Heng, Yuwen, Wu, Yihong, Dasmahapatra, Srinandan and Kim, Hansung
(2022)
Enhancing material features using dynamic backward attention on cross-resolution patches.
The 33rd British Machine Vision Conference, , London, United Kingdom.
21 - 24 Nov 2022.
15 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Recent studies in material segmentation crop the image into patches to force the network to learn material features from local visual clues. This design is based on the expectation that the contextually invariant features can generalise the network to unseen images regardless of the object or scene in which the material appears. However, most approaches set a fixed patch resolution for all the images in a dataset, which does not consider the varying areas that materials cover within and across images due to the scene scale. As a consequence, the fixed patch resolution can limit the performance of networks. In consideration of this problem, this paper proposes a Dynamic Backward Attention Transformer (DBAT) to extract features from cross-resolution patches and dynamically aggregate these features based on per-pixel attention masks. Experiments show that DBAT achieves the best performance among state-of-the-art models (86.85% in average pixel accuracy, which is 2.15% higher than the second-best model) that can serve real-time inference. Moreover, we also illustrate the network behaviour through visualisation methods as well as descriptive statistics. The project code is available at https://github.com/heng-yuwen/Dynamic-Backward-Attention-Transformer.
UNSPECIFIED
119_22BMVC_Yuwen
- Version of Record
Restricted to Repository staff only
Request a copy
More information
e-pub ahead of print date: 21 November 2022
Venue - Dates:
The 33rd British Machine Vision Conference, , London, United Kingdom, 2022-11-21 - 2022-11-24
Identifiers
Local EPrints ID: 479337
URI: http://eprints.soton.ac.uk/id/eprint/479337
PURE UUID: 55d7a785-d0c3-4950-9e43-768939585d90
Catalogue record
Date deposited: 20 Jul 2023 17:29
Last modified: 01 Oct 2024 02:03
Export record
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics