The University of Southampton
University of Southampton Institutional Repository

Mogo: residual quantized hierarchical causal transformer for high-quality and real-time 3D human motion generation

Mogo: residual quantized hierarchical causal transformer for high-quality and real-time 3D human motion generation
Mogo: residual quantized hierarchical causal transformer for high-quality and real-time 3D human motion generation
Recent advances in transformer-based text-to-motion generation have led to impressive progress in synthesizing high-quality human motion. Nevertheless, jointly achieving high fidelity, streaming capability, real-time responsiveness, and scalability remains a fundamental challenge. In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework tailored for efficient and real-time 3D motion generation. MOGO comprises two key components: (1) MoSA-VQ, a motion scale-adaptive residual vector quantization module that hierarchically discretizes motion sequences with learnable scaling to produce compact yet expressive representations; and (2) RQHC-Transformer, a residual quantized hierarchical causal transformer that generates multi-layer motion tokens in a single forward pass, significantly reducing inference latency. To enhance semantic fidelity, we further introduce a text condition alignment mechanism that improves motion decoding under textual control. Extensive experiments on benchmark datasets including HumanML3D, KIT-ML, and CMP demonstrate that MOGO achieves competitive or superior generation quality compared to state-of-the-art transformer-based methods, while offering substantial improvements in real-time performance, streaming generation, and generalization under zero-shot settings.
cs.CV, cs.AI
arXiv
Fu, Dongjie
d5a38410-7e56-4963-9aae-c2c5199621b2
Sun, Tengjiao
c5e1adca-e857-41af-939b-03a12bc57a9b
Fang, Pengcheng
7f3b5cc1-6fd3-4e94-8338-0820f3fbd189
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Fu, Dongjie
d5a38410-7e56-4963-9aae-c2c5199621b2
Sun, Tengjiao
c5e1adca-e857-41af-939b-03a12bc57a9b
Fang, Pengcheng
7f3b5cc1-6fd3-4e94-8338-0820f3fbd189
Cai, Xiaohao
de483445-45e9-4b21-a4e8-b0427fc72cee
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Recent advances in transformer-based text-to-motion generation have led to impressive progress in synthesizing high-quality human motion. Nevertheless, jointly achieving high fidelity, streaming capability, real-time responsiveness, and scalability remains a fundamental challenge. In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework tailored for efficient and real-time 3D motion generation. MOGO comprises two key components: (1) MoSA-VQ, a motion scale-adaptive residual vector quantization module that hierarchically discretizes motion sequences with learnable scaling to produce compact yet expressive representations; and (2) RQHC-Transformer, a residual quantized hierarchical causal transformer that generates multi-layer motion tokens in a single forward pass, significantly reducing inference latency. To enhance semantic fidelity, we further introduce a text condition alignment mechanism that improves motion decoding under textual control. Extensive experiments on benchmark datasets including HumanML3D, KIT-ML, and CMP demonstrate that MOGO achieves competitive or superior generation quality compared to state-of-the-art transformer-based methods, while offering substantial improvements in real-time performance, streaming generation, and generalization under zero-shot settings.

Text
2506.05952v1 - Author's Original
Download (17MB)

More information

Published date: 6 June 2025
Additional Information: 9 pages, 4 figures, conference
Keywords: cs.CV, cs.AI

Identifiers

Local EPrints ID: 502990
URI: http://eprints.soton.ac.uk/id/eprint/502990
PURE UUID: d7eadd98-02a1-4d33-b982-700afe6c6a14
ORCID for Pengcheng Fang: ORCID iD orcid.org/0009-0008-6215-4335
ORCID for Xiaohao Cai: ORCID iD orcid.org/0000-0003-0924-2834
ORCID for Hansung Kim: ORCID iD orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 15 Jul 2025 16:54
Last modified: 17 Jul 2025 02:27

Export record

Altmetrics

Contributors

Author: Dongjie Fu
Author: Tengjiao Sun
Author: Pengcheng Fang ORCID iD
Author: Xiaohao Cai ORCID iD
Author: Hansung Kim ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×