TrackGo: A Flexible and Efficient Method for Controllable Video Generation

  • Haitao Zhou
  • , Chuang Wang
  • , Rui Nie
  • , Jinlin Liu
  • , Dongdong Yu
  • , Qian Yu*
  • , Changhu Wang*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the TrackAdapter for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate that our new approach, enhanced by the TrackAdapter, achieves state-of-the-art performance on key metrics such as FVD, FID, and ObjMC scores.

Original languageEnglish
Pages (from-to)10743-10751
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number10
DOIs
StatePublished - 11 Apr 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Fingerprint

Dive into the research topics of 'TrackGo: A Flexible and Efficient Method for Controllable Video Generation'. Together they form a unique fingerprint.

Cite this