Skip to main navigation Skip to search Skip to main content

STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers

  • Zexian Li
  • , Tian Wang*
  • , Aichun Zhu
  • , Kexin Liu
  • , Peng Shi
  • , Hichem Snoussi
  • *Corresponding author for this work
  • Beihang University
  • Nanjing University of Science and Technology
  • Nanjing Tech University
  • Fujian Normal University
  • Université de technologie de Troyes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Spatio-temporal action detection methods locate human actions in both spatial and temporal dimension, which usually follow a two-stage structure. In this paper, We propose STD-TR, a novel spatio-temporal action detection framework with an end-to-end transformer structure. STD-TR employs two branches to extract feature from video clip and key frame concurrently, then sends the aggregated feature to the transformer encoder-decoder. View spatio-temporal action detection as a set matching and prediction problem, STD-TR employs learned object queries to model the relation of feature context, and directly outputs all predictions at one inference time. Our method remove all hand-designed and can be optimized by a joint loss. Besides, a Hungarian algorithm and a upgraded linking strategy are used for bipartite set matching and action tube generation respectively. Convincing experiment result on challenging dataset demonstrates the superiority of our method.

Original languageEnglish
Title of host publicationProceeding - 2021 China Automation Congress, CAC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7615-7620
Number of pages6
ISBN (Electronic)9781665426473
DOIs
StatePublished - 2021
Event2021 China Automation Congress, CAC 2021 - Beijing, China
Duration: 22 Oct 202124 Oct 2021

Publication series

NameProceeding - 2021 China Automation Congress, CAC 2021

Conference

Conference2021 China Automation Congress, CAC 2021
Country/TerritoryChina
CityBeijing
Period22/10/2124/10/21

Keywords

  • Action Detection
  • End-to-End
  • Spatio-Temporal Action Detection
  • Transformers

Fingerprint

Dive into the research topics of 'STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers'. Together they form a unique fingerprint.

Cite this