跳到主要导航 跳到搜索 跳到主要内容

DeepCT: A novel deep complex-valued network with learnable transform for video saliency prediction

  • University of British Columbia
  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

The past decade has witnessed the success of transformed domain methods for image saliency prediction. However, it is intractable to develop a transformed domain method for video saliency prediction, due to the limited choices on spatio-temporal transforms. In this paper, we propose learning the transform from training data, rather than the predefined transform in the existing methods. Specifically, we develop a novel deep Complex-valued network with learnable Transform (DeepCT) for video saliency prediction. The architecture of DeepCT includes the Complex-valued Transform Module (CTM), inverse CTM (iCTM) and Complex-valued Stacked Convolutional Long Short-Term Memory network (CS-ConvLSTM). In the CTM and iCTM, multi-scale pyramid structures are introduced, as we find that transforms at multiple receptive scales can improve the accuracy of saliency prediction. To make the CTM and iCTM “invertible”, we further propose the cycle consistency loss in training DeepCT, which is composed of frame reconstruction loss and complex feature reconstruction loss. Additionally, the CS-ConvLSTM is developed to learn the temporal saliency transition across video frames. Finally, the experimental results show that our DeepCT method outperforms other 13 state-of-the-art methods for video saliency prediction.

源语言英语
文章编号107234
期刊Pattern Recognition
102
DOI
出版状态已出版 - 6月 2020

指纹

探究 'DeepCT: A novel deep complex-valued network with learnable transform for video saliency prediction' 的科研主题。它们共同构成独一无二的指纹。

引用此