MHNet: a hybrid network for high-resolution remote sensing image semantic segmentation based on multiscale feature fusion
MHNet: a hybrid network for high-resolution remote sensing image semantic segmentation based on multiscale feature fusion
Semantic segmentation of high-resolution remote sensing images (HRSIs) presents significant challenges, such as discrete object distributions, diverse scales, and class imbalance, which lead to problems like blurred boundary segmentation and insufficient global semantic associations. Although traditional convolutional neural networks are excellent in local feature extraction, their inherent structure limits the modeling of long-range dependencies. Transformer can model global context, but the quadratic complexity of the self-attention mechanism leads to high computational costs when dealing with HRSIs. Therefore, this manuscript proposes a novel encoder-decoder network called Multiscale Hybrid Network (MHNet) to effectively improve the segmentation performance of HRSIs through multiscale feature fusion, global context modeling, and boundary detail optimization. Specifically, in the encoder, the Neighborhood Feature Fusion (NFF) module is designed to fuse neighboring layer features, and aggregate low-level details and high-level semantics by channel and spatial attention. For the decoder, the Multiscale Refinement Enhanced Transformer Block (MRETB) and the Multiscale Refinement Attention Fusion (MRAF) module are proposed. Among them, MRETB uses the Multiscale Refinement Enhancement (MSRE) module to extract multiscale features and enhance boundary information, and the Window-based Efficient Multi-Head Self-Attention Mechanism (W-EMSA) to model long-range dependencies. MRAF further integrates the multiscale global context output by MRETB through integrating multilayer features and optimizing boundary details. The performance of MHNet is verified by experiments conducted on three public remote sensing image datasets.
Boundary refined, Multiscale, Remote sensing, Semantic segmentation, Transformer
Zeng, Qiaolin
8c68a15a-12b1-4653-8843-5a7be9e41acc
Chen, Shitong
affb4331-bf0f-49cd-84e9-975cfea529d3
Fan, Meng
7b281f11-91f7-4a2b-97d5-d707591ab50c
Chen, Liangfu
cba91e61-e0e7-41c0-818b-53b199deeb38
Zhu, Songyan
122e3311-4c1f-48e9-8aa3-09fcbe990cd9
Zhou, Jingxiang
621d352b-8850-43ee-b7f7-38c122591c3d
27 February 2026
Zeng, Qiaolin
8c68a15a-12b1-4653-8843-5a7be9e41acc
Chen, Shitong
affb4331-bf0f-49cd-84e9-975cfea529d3
Fan, Meng
7b281f11-91f7-4a2b-97d5-d707591ab50c
Chen, Liangfu
cba91e61-e0e7-41c0-818b-53b199deeb38
Zhu, Songyan
122e3311-4c1f-48e9-8aa3-09fcbe990cd9
Zhou, Jingxiang
621d352b-8850-43ee-b7f7-38c122591c3d
Zeng, Qiaolin, Chen, Shitong, Fan, Meng, Chen, Liangfu, Zhu, Songyan and Zhou, Jingxiang
(2026)
MHNet: a hybrid network for high-resolution remote sensing image semantic segmentation based on multiscale feature fusion.
Digital Signal Processing, 175, [106014].
(doi:10.1016/j.dsp.2026.106014).
Abstract
Semantic segmentation of high-resolution remote sensing images (HRSIs) presents significant challenges, such as discrete object distributions, diverse scales, and class imbalance, which lead to problems like blurred boundary segmentation and insufficient global semantic associations. Although traditional convolutional neural networks are excellent in local feature extraction, their inherent structure limits the modeling of long-range dependencies. Transformer can model global context, but the quadratic complexity of the self-attention mechanism leads to high computational costs when dealing with HRSIs. Therefore, this manuscript proposes a novel encoder-decoder network called Multiscale Hybrid Network (MHNet) to effectively improve the segmentation performance of HRSIs through multiscale feature fusion, global context modeling, and boundary detail optimization. Specifically, in the encoder, the Neighborhood Feature Fusion (NFF) module is designed to fuse neighboring layer features, and aggregate low-level details and high-level semantics by channel and spatial attention. For the decoder, the Multiscale Refinement Enhanced Transformer Block (MRETB) and the Multiscale Refinement Attention Fusion (MRAF) module are proposed. Among them, MRETB uses the Multiscale Refinement Enhancement (MSRE) module to extract multiscale features and enhance boundary information, and the Window-based Efficient Multi-Head Self-Attention Mechanism (W-EMSA) to model long-range dependencies. MRAF further integrates the multiscale global context output by MRETB through integrating multilayer features and optimizing boundary details. The performance of MHNet is verified by experiments conducted on three public remote sensing image datasets.
Text
manuscript 2
- Accepted Manuscript
Restricted to Repository staff only until 21 February 2027.
Request a copy
More information
e-pub ahead of print date: 21 February 2026
Published date: 27 February 2026
Keywords:
Boundary refined, Multiscale, Remote sensing, Semantic segmentation, Transformer
Identifiers
Local EPrints ID: 511319
URI: http://eprints.soton.ac.uk/id/eprint/511319
ISSN: 1051-2004
PURE UUID: 00098d2f-dcd3-442f-b803-5c1b91e57a1c
Catalogue record
Date deposited: 12 May 2026 16:31
Last modified: 13 May 2026 02:12
Export record
Altmetrics
Contributors
Author:
Qiaolin Zeng
Author:
Shitong Chen
Author:
Meng Fan
Author:
Liangfu Chen
Author:
Songyan Zhu
Author:
Jingxiang Zhou
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics