Skip to main navigation Skip to search Skip to main content

Recurrent Temporal Sparse Autoencoder for attention-based action recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Visual context is fundamental to understand human actions in videos. However, to efficiently employ temporal context information presents an enormous challenge to this area. Two main problems are long-standing: (1) video frames are redundant while discriminative information is sparse; (2) large amount of interference information is mixed in frame sequences. These factors results in redundant computation and recognition failures. In this paper, we propose a learnable temporal attention mechanism to automatically select important time points from action sequences. We design an unsupervised Recurrent Temporal Sparse Autoencoder (RTSAE) network, which learns to extract sparse key-frames to sharpen discriminative yet to retain descriptive capability, as well to shield interfere information. By applying this technique to a recent proposed action recognition model Adaptive Recurrent-convolutional Hybrid network (ARCH), we significantly improve its performance in both speed and accuracy. Experiments demonstrate that, with the help of the RTSAE, ARCH outperforms most state-of-the-art methods on UCF101 and HMDB51 datasets.

Original languageEnglish
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages456-463
Number of pages8
ISBN (Electronic)9781509006199
DOIs
StatePublished - 31 Oct 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: 24 Jul 201629 Jul 2016

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2016-October

Conference

Conference2016 International Joint Conference on Neural Networks, IJCNN 2016
Country/TerritoryCanada
CityVancouver
Period24/07/1629/07/16

Fingerprint

Dive into the research topics of 'Recurrent Temporal Sparse Autoencoder for attention-based action recognition'. Together they form a unique fingerprint.

Cite this