AugMapNet: improving spatial latent structure via BEV grid augmentation for enhanced vectorized online HD map construction
AugMapNet: improving spatial latent structure via BEV grid augmentation for enhanced vectorized online HD map construction
Autonomous driving requires understanding infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird’s-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV feature grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining easy to integrate compared to other hybrid approaches. It additionally benefits from extra processing on its latent BEV features. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements on vectorized map prediction of up to 13.3 % over the StreamMapNet baseline on 60 m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline, SQD-MapNet, and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code can be found at https://github.com/tmonnin/augmapnet.
8541-8550
Monninger, Thomas
4b9da19d-b0db-44fa-81df-85cfa01bb716
Anwar, Md Zafar
6757b332-586c-4dce-9ff2-f740e38a681b
Antol, Stanislaw
63498576-45e5-4b9b-9484-b19c968b2f9c
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Ding, Sihao
509a57ec-06d6-4f50-8013-71078448906c
5 May 2026
Monninger, Thomas
4b9da19d-b0db-44fa-81df-85cfa01bb716
Anwar, Md Zafar
6757b332-586c-4dce-9ff2-f740e38a681b
Antol, Stanislaw
63498576-45e5-4b9b-9484-b19c968b2f9c
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Ding, Sihao
509a57ec-06d6-4f50-8013-71078448906c
Monninger, Thomas, Anwar, Md Zafar, Antol, Stanislaw, Staab, Steffen and Ding, Sihao
(2026)
AugMapNet: improving spatial latent structure via BEV grid augmentation for enhanced vectorized online HD map construction.
In 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
IEEE.
.
(doi:10.1109/WACV61042.2026.00824).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Autonomous driving requires understanding infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird’s-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV feature grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining easy to integrate compared to other hybrid approaches. It additionally benefits from extra processing on its latent BEV features. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements on vectorized map prediction of up to 13.3 % over the StreamMapNet baseline on 60 m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline, SQD-MapNet, and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code can be found at https://github.com/tmonnin/augmapnet.
Text
Monninger_AugMapNet_Improving_Spatial_Latent_Structure_via_BEV_Grid_Augmentation_for_WACV_2026_paper
- Accepted Manuscript
More information
Published date: 5 May 2026
Venue - Dates:
The IEEE/CVF Winter Conference on Applications of Computer Vision, , Tucson, Arizona, United States, 2026-03-06 - 2026-03-10
Identifiers
Local EPrints ID: 511653
URI: http://eprints.soton.ac.uk/id/eprint/511653
PURE UUID: 01bc9c4e-2950-4a00-9802-ff73f8b4aef8
Catalogue record
Date deposited: 26 May 2026 17:00
Last modified: 27 May 2026 01:48
Export record
Altmetrics
Contributors
Author:
Thomas Monninger
Author:
Md Zafar Anwar
Author:
Stanislaw Antol
Author:
Steffen Staab
Author:
Sihao Ding
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics