Talk2Radar: bridging natural language with 4D mmWave radar for 3D referring expression comprehension

Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with affordable cost, 4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects, thereby enhancing the reliability of perception systems. To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension (REC). Talk2Radar contains 8,682 referring prompt samples with 20,558 referred objects. Moreover, we propose a novel model, T-RadarNet, for 3D REC on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset compared to counterparts. Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Comprehensive experiments provide deep insights into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar.

cs.RO, cs.CV

10.48550/arXiv.2405.12821

arXiv

Guan, Runwei

c9bbd12d-493e-4e99-a2eb-7b6150a0bde8

Zhang, Ruixiao

fc3c4eb9-b692-4ab3-8056-030cb6731fc5

Ouyang, Ningwei

7fe50e3f-2ed7-4e54-8821-e3344190835f

Liu, Jianan

2fb92d3c-502a-4f3c-89ef-69b18a5a262e

Man, Ka Lok

b317a4f0-1391-43b7-84a6-f388e97a341d

Cai, Xiaohao

de483445-45e9-4b21-a4e8-b0427fc72cee

Xu, Ming

51f8f898-0bc6-40eb-aad0-ad612bd4857e

Smith, Jeremy

6d488539-cb40-4e16-8f04-805687fe7a1e

Lim, Eng Gee

431ad550-6a3b-4403-9a00-39ba30b97ca9

Yue, Yutao

39e0cc36-8d8e-4be2-bac2-49fb05cc962f

Xiong, Hui

3c13b2bd-05c3-4a86-b06b-e2a5bcd5a1e0

21 May 2024

Guan, Runwei

c9bbd12d-493e-4e99-a2eb-7b6150a0bde8

Zhang, Ruixiao

fc3c4eb9-b692-4ab3-8056-030cb6731fc5

Ouyang, Ningwei

7fe50e3f-2ed7-4e54-8821-e3344190835f

Liu, Jianan

2fb92d3c-502a-4f3c-89ef-69b18a5a262e

Man, Ka Lok

b317a4f0-1391-43b7-84a6-f388e97a341d

Cai, Xiaohao

de483445-45e9-4b21-a4e8-b0427fc72cee

Xu, Ming

51f8f898-0bc6-40eb-aad0-ad612bd4857e

Smith, Jeremy

6d488539-cb40-4e16-8f04-805687fe7a1e

Lim, Eng Gee

431ad550-6a3b-4403-9a00-39ba30b97ca9

Yue, Yutao

39e0cc36-8d8e-4be2-bac2-49fb05cc962f

Xiong, Hui

3c13b2bd-05c3-4a86-b06b-e2a5bcd5a1e0

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Text

2405.12821v2 - Author's Original

Available under License Other.

Download (3MB)

More information

Published date: 21 May 2024

Additional Information: 5 figures

Keywords: cs.RO, cs.CV

Learn more about Vision, Learning and Control research Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 498021

URI: http://eprints.soton.ac.uk/id/eprint/498021

DOI: doi:10.48550/arXiv.2405.12821

PURE UUID: 71281a4c-3c2e-47f2-90a4-8fb72e28b6d2

ORCID for Xiaohao Cai:

orcid.org/0000-0003-0924-2834

Catalogue record

Date deposited: 06 Feb 2025 17:32

Last modified: 07 Feb 2025 03:02

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Runwei Guan

Author: Ruixiao Zhang

Author: Ningwei Ouyang

Author: Jianan Liu

Author: Ka Lok Man

Author: Xiaohao Cai

Author: Ming Xu

Author: Jeremy Smith

Author: Eng Gee Lim

Author: Yutao Yue

Author: Hui Xiong

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information