The University of Southampton
University of Southampton Institutional Repository

GRASPTrack: geometry-reasoned association via segmentation and projection for multi-object tracking

GRASPTrack: geometry-reasoned association via segmentation and projection for multi-object tracking
GRASPTrack: geometry-reasoned association via segmentation and projection for multi-object tracking
Multi-object tracking (MOT) in monocular videos is fundamentally challenged by occlusions and depth ambiguity, issues that conventional tracking-by-detection (TBD) methods struggle to resolve owing to a lack of geometric awareness. To address these limitations, we introduce GRASPTrack, a novel depth-aware MOT framework that integrates monocular depth estimation and instance segmentation into a standard TBD pipeline to generate high-fidelity 3D point clouds from 2D detections, thereby enabling explicit 3D geometric reasoning. These 3D point clouds are then voxelized to enable a precise and robust Voxel-Based 3D Intersection-over-Union (IoU) for spatial association. To further enhance tracking robustness, our approach incorporates Depth-aware Adaptive Noise Compensation, which dynamically adjusts the Kalman filter process noise based on occlusion severity for more reliable state estimation. Additionally, we propose a Depth-enhanced Observation-Centric Momentum, which extends the motion direction consistency from the image plane into 3D space to improve motion-based association cues, particularly for objects with complex trajectories. Extensive experiments on the MOT17, MOT20, and DanceTrack benchmarks demonstrate that our method achieves competitive performance, significantly improving tracking robustness in complex scenes with frequent occlusions and intricate motion patterns.
cs.CV, cs.AI
arXiv
Han, Xudong
b41f0095-35a2-4007-8e02-f0addd85cc10
Fang, Pengcheng
7f3b5cc1-6fd3-4e94-8338-0820f3fbd189
Tian, Yueying
32acbb39-4e66-45d1-8a6f-5b9632922929
Yu, Jianhui
f732d2f2-b321-4242-b7f7-08c31e62f81f
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Roggen, Daniel
14156d14-338a-4b58-8e0e-5c94995cf60b
Birch, Philip
47e5a098-ab6a-4a23-b482-04f15a0e3860
Han, Xudong
b41f0095-35a2-4007-8e02-f0addd85cc10
Fang, Pengcheng
7f3b5cc1-6fd3-4e94-8338-0820f3fbd189
Tian, Yueying
32acbb39-4e66-45d1-8a6f-5b9632922929
Yu, Jianhui
f732d2f2-b321-4242-b7f7-08c31e62f81f
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Roggen, Daniel
14156d14-338a-4b58-8e0e-5c94995cf60b
Birch, Philip
47e5a098-ab6a-4a23-b482-04f15a0e3860

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Multi-object tracking (MOT) in monocular videos is fundamentally challenged by occlusions and depth ambiguity, issues that conventional tracking-by-detection (TBD) methods struggle to resolve owing to a lack of geometric awareness. To address these limitations, we introduce GRASPTrack, a novel depth-aware MOT framework that integrates monocular depth estimation and instance segmentation into a standard TBD pipeline to generate high-fidelity 3D point clouds from 2D detections, thereby enabling explicit 3D geometric reasoning. These 3D point clouds are then voxelized to enable a precise and robust Voxel-Based 3D Intersection-over-Union (IoU) for spatial association. To further enhance tracking robustness, our approach incorporates Depth-aware Adaptive Noise Compensation, which dynamically adjusts the Kalman filter process noise based on occlusion severity for more reliable state estimation. Additionally, we propose a Depth-enhanced Observation-Centric Momentum, which extends the motion direction consistency from the image plane into 3D space to improve motion-based association cues, particularly for objects with complex trajectories. Extensive experiments on the MOT17, MOT20, and DanceTrack benchmarks demonstrate that our method achieves competitive performance, significantly improving tracking robustness in complex scenes with frequent occlusions and intricate motion patterns.

Text
2508.08117v1 - Author's Original
Available under License Other.
Download (1MB)

More information

Published date: 11 August 2025
Keywords: cs.CV, cs.AI

Identifiers

Local EPrints ID: 507655
URI: http://eprints.soton.ac.uk/id/eprint/507655
PURE UUID: d55dd561-fdaa-4ae3-847d-38788e58554a
ORCID for Pengcheng Fang: ORCID iD orcid.org/0009-0008-6215-4335
ORCID for Xiaohao Cai: ORCID iD orcid.org/0000-0003-0924-2834

Catalogue record

Date deposited: 16 Dec 2025 18:14
Last modified: 18 Dec 2025 03:16

Export record

Altmetrics

Contributors

Author: Xudong Han
Author: Pengcheng Fang ORCID iD
Author: Yueying Tian
Author: Jianhui Yu
Author: Xiaohao Cai ORCID iD
Author: Daniel Roggen
Author: Philip Birch

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×