TY - GEN
T1 - Recurrent Temporal Sparse Autoencoder for attention-based action recognition
AU - Xin, Miao
AU - Zhang, Hong
AU - Sun, Mingui
AU - Yuan, Ding
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - Visual context is fundamental to understand human actions in videos. However, to efficiently employ temporal context information presents an enormous challenge to this area. Two main problems are long-standing: (1) video frames are redundant while discriminative information is sparse; (2) large amount of interference information is mixed in frame sequences. These factors results in redundant computation and recognition failures. In this paper, we propose a learnable temporal attention mechanism to automatically select important time points from action sequences. We design an unsupervised Recurrent Temporal Sparse Autoencoder (RTSAE) network, which learns to extract sparse key-frames to sharpen discriminative yet to retain descriptive capability, as well to shield interfere information. By applying this technique to a recent proposed action recognition model Adaptive Recurrent-convolutional Hybrid network (ARCH), we significantly improve its performance in both speed and accuracy. Experiments demonstrate that, with the help of the RTSAE, ARCH outperforms most state-of-the-art methods on UCF101 and HMDB51 datasets.
AB - Visual context is fundamental to understand human actions in videos. However, to efficiently employ temporal context information presents an enormous challenge to this area. Two main problems are long-standing: (1) video frames are redundant while discriminative information is sparse; (2) large amount of interference information is mixed in frame sequences. These factors results in redundant computation and recognition failures. In this paper, we propose a learnable temporal attention mechanism to automatically select important time points from action sequences. We design an unsupervised Recurrent Temporal Sparse Autoencoder (RTSAE) network, which learns to extract sparse key-frames to sharpen discriminative yet to retain descriptive capability, as well to shield interfere information. By applying this technique to a recent proposed action recognition model Adaptive Recurrent-convolutional Hybrid network (ARCH), we significantly improve its performance in both speed and accuracy. Experiments demonstrate that, with the help of the RTSAE, ARCH outperforms most state-of-the-art methods on UCF101 and HMDB51 datasets.
UR - https://www.scopus.com/pages/publications/85007198757
U2 - 10.1109/IJCNN.2016.7727234
DO - 10.1109/IJCNN.2016.7727234
M3 - 会议稿件
AN - SCOPUS:85007198757
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 456
EP - 463
BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Joint Conference on Neural Networks, IJCNN 2016
Y2 - 24 July 2016 through 29 July 2016
ER -