Skip to main navigation Skip to search Skip to main content

Modeling Sub-Actions for Weakly Supervised Temporal Action Localization

  • Linjiang Huang
  • , Yan Huang
  • , Wanli Ouyang
  • , Liang Wang*
  • *Corresponding author for this work
  • CAS - Institute of Automation
  • University of Chinese Academy of Sciences
  • Chinese Academy of Sciences
  • The University of Sydney

Research output: Contribution to journalArticlepeer-review

Abstract

As a challenging task of high-level video understanding, weakly supervised temporal action localization has attracted more attention recently. Due to the usage of video-level category labels, this task is usually formulated as the task of classification, which always suffers from the contradiction between classification and detection. In this paper, we describe a novel approach to alleviate the contradiction for detecting more complete action instances by explicitly modeling sub-actions. Our method makes use of three innovations to model the latent sub-actions. First, our framework uses prototypes to represent sub-actions, which can be automatically learned in an end-to-end way. Second, we regard the relations among sub-actions as a graph, and construct the correspondences between sub-actions and actions by the graph pooling operation. Doing so not only makes the sub-actions inter-dependent to facilitate the multi-label setting, but also naturally use the video-level labels as weak supervision. Third, we devise three complementary loss functions, namely, representation loss, balance loss and relation loss to ensure the learned sub-actions are diverse and have clear semantic meanings. Experimental results on THUMOS14 and ActivityNet1.3 datasets demonstrate the effectiveness of our method and superior performance over state-of-the-art approaches.

Original languageEnglish
Article number9430747
Pages (from-to)5154-5167
Number of pages14
JournalIEEE Transactions on Image Processing
Volume30
DOIs
StatePublished - 2021
Externally publishedYes

Keywords

  • Weakly supervised learning
  • sub-action modeling
  • temporal action localization

Fingerprint

Dive into the research topics of 'Modeling Sub-Actions for Weakly Supervised Temporal Action Localization'. Together they form a unique fingerprint.

Cite this