The University of Southampton
University of Southampton Institutional Repository

Talk2Radar: bridging natural language with 4D mmWave radar for 3D referring expression comprehension

Talk2Radar: bridging natural language with 4D mmWave radar for 3D referring expression comprehension
Talk2Radar: bridging natural language with 4D mmWave radar for 3D referring expression comprehension
Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with affordable cost, 4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects, thereby enhancing the reliability of perception systems. To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension (REC). Talk2Radar contains 8,682 referring prompt samples with 20,558 referred objects. Moreover, we propose a novel model, T-RadarNet, for 3D REC on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset compared to counterparts. Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Comprehensive experiments provide deep insights into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar.
cs.RO, cs.CV
arXiv
Guan, Runwei
c9bbd12d-493e-4e99-a2eb-7b6150a0bde8
Zhang, Ruixiao
fc3c4eb9-b692-4ab3-8056-030cb6731fc5
Ouyang, Ningwei
7fe50e3f-2ed7-4e54-8821-e3344190835f
Liu, Jianan
2fb92d3c-502a-4f3c-89ef-69b18a5a262e
Man, Ka Lok
b317a4f0-1391-43b7-84a6-f388e97a341d
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Xu, Ming
51f8f898-0bc6-40eb-aad0-ad612bd4857e
Smith, Jeremy
6d488539-cb40-4e16-8f04-805687fe7a1e
Lim, Eng Gee
431ad550-6a3b-4403-9a00-39ba30b97ca9
Yue, Yutao
39e0cc36-8d8e-4be2-bac2-49fb05cc962f
Xiong, Hui
3c13b2bd-05c3-4a86-b06b-e2a5bcd5a1e0
Guan, Runwei
c9bbd12d-493e-4e99-a2eb-7b6150a0bde8
Zhang, Ruixiao
fc3c4eb9-b692-4ab3-8056-030cb6731fc5
Ouyang, Ningwei
7fe50e3f-2ed7-4e54-8821-e3344190835f
Liu, Jianan
2fb92d3c-502a-4f3c-89ef-69b18a5a262e
Man, Ka Lok
b317a4f0-1391-43b7-84a6-f388e97a341d
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Xu, Ming
51f8f898-0bc6-40eb-aad0-ad612bd4857e
Smith, Jeremy
6d488539-cb40-4e16-8f04-805687fe7a1e
Lim, Eng Gee
431ad550-6a3b-4403-9a00-39ba30b97ca9
Yue, Yutao
39e0cc36-8d8e-4be2-bac2-49fb05cc962f
Xiong, Hui
3c13b2bd-05c3-4a86-b06b-e2a5bcd5a1e0

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with affordable cost, 4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects, thereby enhancing the reliability of perception systems. To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension (REC). Talk2Radar contains 8,682 referring prompt samples with 20,558 referred objects. Moreover, we propose a novel model, T-RadarNet, for 3D REC on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset compared to counterparts. Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Comprehensive experiments provide deep insights into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar.

Text
2405.12821v2 - Author's Original
Available under License Other.
Download (3MB)

More information

Published date: 21 May 2024
Additional Information: 5 figures
Keywords: cs.RO, cs.CV

Identifiers

Local EPrints ID: 498021
URI: http://eprints.soton.ac.uk/id/eprint/498021
PURE UUID: 71281a4c-3c2e-47f2-90a4-8fb72e28b6d2
ORCID for Xiaohao Cai: ORCID iD orcid.org/0000-0003-0924-2834

Catalogue record

Date deposited: 06 Feb 2025 17:32
Last modified: 07 Feb 2025 03:02

Export record

Altmetrics

Contributors

Author: Runwei Guan
Author: Ruixiao Zhang
Author: Ningwei Ouyang
Author: Jianan Liu
Author: Ka Lok Man
Author: Xiaohao Cai ORCID iD
Author: Ming Xu
Author: Jeremy Smith
Author: Eng Gee Lim
Author: Yutao Yue
Author: Hui Xiong

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×