跳到主要导航 跳到搜索 跳到主要内容

STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers

  • Zexian Li
  • , Tian Wang*
  • , Aichun Zhu
  • , Kexin Liu
  • , Peng Shi
  • , Hichem Snoussi
  • *此作品的通讯作者
  • Beihang University
  • Nanjing University of Science and Technology
  • Nanjing Tech University
  • Fujian Normal University
  • Université de technologie de Troyes

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Spatio-temporal action detection methods locate human actions in both spatial and temporal dimension, which usually follow a two-stage structure. In this paper, We propose STD-TR, a novel spatio-temporal action detection framework with an end-to-end transformer structure. STD-TR employs two branches to extract feature from video clip and key frame concurrently, then sends the aggregated feature to the transformer encoder-decoder. View spatio-temporal action detection as a set matching and prediction problem, STD-TR employs learned object queries to model the relation of feature context, and directly outputs all predictions at one inference time. Our method remove all hand-designed and can be optimized by a joint loss. Besides, a Hungarian algorithm and a upgraded linking strategy are used for bipartite set matching and action tube generation respectively. Convincing experiment result on challenging dataset demonstrates the superiority of our method.

源语言英语
主期刊名Proceeding - 2021 China Automation Congress, CAC 2021
出版商Institute of Electrical and Electronics Engineers Inc.
7615-7620
页数6
ISBN(电子版)9781665426473
DOI
出版状态已出版 - 2021
活动2021 China Automation Congress, CAC 2021 - Beijing, 中国
期限: 22 10月 202124 10月 2021

出版系列

姓名Proceeding - 2021 China Automation Congress, CAC 2021

会议

会议2021 China Automation Congress, CAC 2021
国家/地区中国
Beijing
时期22/10/2124/10/21

指纹

探究 'STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers' 的科研主题。它们共同构成独一无二的指纹。

引用此