An enhanced 3DCNN-ConvLSTM for spatiotemporal multimedia data analysis

  • Tian Wang
  • , Jiakun Li
  • , Mengyi Zhang
  • , Aichun Zhu
  • , Hichem Snoussi
  • , Chang Choi*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

At present, human action recognition is a challenging and complex task in the field of computer vision. The combination of CNN and RNN is a common and effective network structure for this task. Especially, we use 3DCNN in CNN part and ConvLSTM in RNN part. We divide the video into multiple temporal segments by average and compress each segment into one feature map by pooling layer. Adding the pooling layer, dropout layer, and batch normalization layer into ConvLSTM is our groundbreaking work. We test our model on KTH, UCF-11, and HMDB51 datasets and achieve a high accuracy of action recognition.

Original languageEnglish
Article numbere5302
JournalConcurrency and Computation: Practice and Experience
Volume33
Issue number2
DOIs
StatePublished - 25 Jan 2021

Keywords

  • 3DCNN
  • ConvLSTM
  • action recognition

Fingerprint

Dive into the research topics of 'An enhanced 3DCNN-ConvLSTM for spatiotemporal multimedia data analysis'. Together they form a unique fingerprint.

Cite this