跳到主要导航 跳到搜索 跳到主要内容

Group Activity Representation Learning With Long-Short States Predictive Transformer

  • Longteng Kong
  • , Wanting Zhou
  • , Duoxuan Pei
  • , Zhaofeng He*
  • , Di Huang
  • *此作品的通讯作者
  • Beijing University of Posts and Telecommunications
  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

The research goal of this paper is to learn the group activity representations in a self-supervised fashion instead of through the use of conventional methods that rely on manually annotated labels. It is essential for this task to better describe the complex group states and their future transitions. To this end, we propose a long-short state predictive Transformer (LSSPT), which mines the meaningful spatiotemporal features of group activities by predicting the future group states with long- and short-term historical state dynamics. LSSPT consists of an encoder that models diverse spatiotemporal state representations in the observation, together with a decoder that exploits rich dynamic patterns by attending to both the short-term spatial context and long-term history state evolutions to predict future group states. Furthermore, we consider the distinguishability and consistency of the predicted states and introduce a joint learning mechanism to optimize the models, enabling LSSPT to describe more reliable state transitions. Finally, extensive experiments are carried out to evaluate the learned representation on downstream tasks on the Volleyball, Collective Activity and VolleyTactic datasets, which showcases the method's state-of-the-art performance over the existing self-supervised learning approaches.

源语言英语
文章编号3278984
页(从-至)7267-7281
页数15
期刊IEEE Transactions on Circuits and Systems for Video Technology
33
12
DOI
出版状态已出版 - 1 12月 2023

指纹

探究 'Group Activity Representation Learning With Long-Short States Predictive Transformer' 的科研主题。它们共同构成独一无二的指纹。

引用此