跳到主要导航 跳到搜索 跳到主要内容

SportSal: Hypernetwork-Based Saliency Prediction for Sports Videos

  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

Saliency prediction is crucial for improving sports video processing efficiency, thereby providing an enriched viewing experience for a wide-ranging audience. However, there is a long-term absence of well-established eye-tracking dataset and learning-based approach, particularly tailored for sports videos. In this paper, we establish a large-scale eye-tracking dataset dubbed audio-visual sports (AVS). AVS consists of 1,000 high-quality sports videos with eye fixations from 60 participants. Through data analysis on AVS, we observe that human attention patterns exhibit significant variations based on the specific scene context of the sports. Motivated by our observations, we propose a sports-aware saliency prediction approach, named SportSal, which can adaptively predict saliency maps in a hyper manner. Specifically, a hypernetwork is introduced to learn sports-aware priors. Meanwhile, an audio-visual fusion (AVF) block is developed to effectively fuse features from the visual and audio backbones. Given the learned priors and fused audio-visual features, we propose the hyper deformable convolutional (HDC) block and the hyper upsampling (HU) block for dynamic feature extraction and upsampling, respectively. The two blocks are alternatingly connected to adaptively predict saliency maps. Experimental results show that our approach outperforms 21 state-of-the-art saliency prediction approaches over three sports video eye-tracking datasets. Finally, we demonstrate the application of our SportSal approach in perceptual video compression. The dataset and code will be available at https://github.com/WeNsHiJIe-19950103/SportSal

源语言英语
页(从-至)2980-2998
页数19
期刊IEEE Transactions on Circuits and Systems for Video Technology
36
3
DOI
出版状态已出版 - 2026

指纹

探究 'SportSal: Hypernetwork-Based Saliency Prediction for Sports Videos' 的科研主题。它们共同构成独一无二的指纹。

引用此