SportSal: Hypernetwork-Based Saliency Prediction for Sports Videos

Research output: Contribution to journalArticlepeer-review

Abstract

Saliency prediction is crucial for improving sports video processing efficiency, thereby providing an enriched viewing experience for a wide-ranging audience. However, there is a long-term absence of well-established eye-tracking dataset and learning-based approach, particularly tailored for sports videos. In this paper, we establish a large-scale eye-tracking dataset dubbed audio-visual sports (AVS). AVS consists of 1,000 high-quality sports videos with eye fixations from 60 participants. Through data analysis on AVS, we observe that human attention patterns exhibit significant variations based on the specific scene context of the sports. Motivated by our observations, we propose a sports-aware saliency prediction approach, named SportSal, which can adaptively predict saliency maps in a hyper manner. Specifically, a hypernetwork is introduced to learn sports-aware priors. Meanwhile, an audio-visual fusion (AVF) block is developed to effectively fuse features from the visual and audio backbones. Given the learned priors and fused audio-visual features, we propose the hyper deformable convolutional (HDC) block and the hyper upsampling (HU) block for dynamic feature extraction and upsampling, respectively. The two blocks are alternatingly connected to adaptively predict saliency maps. Experimental results show that our approach outperforms 21 state-of-the-art saliency prediction approaches over three sports video eye-tracking datasets. Finally, we demonstrate the application of our SportSal approach in perceptual video compression. The dataset and code will be available at https://github.com/WeNsHiJIe-19950103/SportSal

Original languageEnglish
Pages (from-to)2980-2998
Number of pages19
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number3
DOIs
StatePublished - 2026

Keywords

  • Sports videos
  • hypernetwork
  • saliency prediction

Fingerprint

Dive into the research topics of 'SportSal: Hypernetwork-Based Saliency Prediction for Sports Videos'. Together they form a unique fingerprint.

Cite this