跳到主要导航 跳到搜索 跳到主要内容

Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach

  • Shijie Wen
  • , Li Yang
  • , Mai Xu*
  • , Minglang Qiao
  • , Tao Xu
  • , Lin Bai
  • *此作品的通讯作者
  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

With the booming development of smart devices, mobile videos have drawn broad interest when humans surf social media. Different from traditional long-form videos, mobile videos are featured with uncertain human attention behavior so far owing to the specific displaying mode, thus promoting the research on saliency prediction for mobile videos. Unfortunately, the current eye-tracking experiments are not applicable for mobile videos, since the stationary eye-tracker and eye fixation acquisition are dedicated to the videos presented on computers. To tackle this issue, we propose performing the wearable eye-tracker to record viewers' egocentric fixations and then devising a fixation mapping technique to project the eye fixations from egocentric videos onto mobile videos. Resorting to this technique, the large-scale mobile video saliency (MVS) dataset is established, including 1,007 mobile videos and 5,935,927 fixations. Given this dataset, we exhaustively analyze the characteristics of subjects' fixations and obtain two findings. Based on the MVS dataset and these findings, we propose a saliency prediction approach on mobile videos upon Video Swin Transformer (MVFormer), wherein long-range spatio-temporal dependency is captured to derive the human attention mechanism on mobile videos. In MVFormer, we develop the selective feature fusion module to balance multi-scale features, and the progressive saliency prediction module to generate saliency maps via progressive aggregation of multi-scale features. Extensive experiments show that our MVFormer approach significantly outperforms other state-of-the-art saliency prediction approaches. Finally, we demonstrate the potential application of our MVFormer approach in the H.265 video coding standard by embedding it into the rate control scheme, such that the perceptual quality of compressed mobile videos can be significantly improved. The dataset and code are available at https://github.com/wenshijie110/MVFormer.

源语言英语
页(从-至)5935-5950
页数16
期刊IEEE Transactions on Circuits and Systems for Video Technology
34
7
DOI
出版状态已出版 - 2024

指纹

探究 'Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach' 的科研主题。它们共同构成独一无二的指纹。

引用此