Skip to main navigation Skip to search Skip to main content

Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation

  • Yin Wang
  • , Mu Li
  • , Jiapeng Liu
  • , Zhiying Leng
  • , Frederick W.B. Li
  • , Ziyao Zhang
  • , Xiaohui Liang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We address the challenging problem of fine-grained text-driven human motion generation. Existing works generate imprecise motions that fail to accurately capture relationships specified in text due to: (1) lack of effective text parsing for detailed semantic cues regarding body parts, (2) not fully modeling linguistic structures between words to comprehend text comprehensively. To tackle these limitations, we propose a novel fine-grained framework Fg-T2M++ that consists of: (1) an LLMs semantic parsing module to extract body part descriptions and semantics from text, (2) a hyperbolic text representation module to encode relational information between text units by embedding the syntactic dependency graph into hyperbolic space, and (3) a multi-modal fusion module to hierarchically fuse text and motion features. Extensive experiments on HumanML3D and KIT-ML datasets demonstrate that Fg-T2M++ outperforms SOTA methods, validating its ability to accurately generate motions adhering to comprehensive text semantics.

Original languageEnglish
Pages (from-to)4277-4293
Number of pages17
JournalInternational Journal of Computer Vision
Volume133
Issue number7
DOIs
StatePublished - Jul 2025

Keywords

  • Diffusion model
  • Human motion
  • Large language model
  • Text driven motion generation

Fingerprint

Dive into the research topics of 'Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation'. Together they form a unique fingerprint.

Cite this