TY - GEN
T1 - MT-VQA
T2 - 3rd Workshop on Quality of Experience in Visual Multimedia Applications, QoEVMA 2024
AU - Wen, Shijie
AU - Qiao, Minglang
AU - Jiang, Lai
AU - Xu, Mai
AU - Deng, Xin
AU - Li, Shengxi
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Short-form video, as a mainstream media form on video platforms, has undergone explosive growth in recent years. A vast number of short-form videos are produced, processed, and distributed to users each day, inevitably leading to quality degradation. Therefore, accurate video quality assessment (VQA) is critical for monitoring and optimizing the viewing experience of users. However, the existing short-form VQA approaches neglect human attention patterns during the viewing of videos. Besides, the advancement of short-form VQA is obstructed by the absence of large-scale datasets. To tackle the above challenges, we first construct a large-scale short-form VQA dataset called SVQA. The SVQA dataset comprises diverse distortion types, covering the typical quality degradations that arise during the photography, encoding, and editing of short-form videos. Besides, for each short-form video in SVQA, we collect both quality score and eye-tracking annotation. Based on our dataset, we propose a two-branch multi-task VQA approach, MT-VQA, in which both tasks of VQA and video saliency prediction (VSP) can be accomplished for short-form videos. We further propose a saliency fusion module to guide the VQA branch to focus on quality distortions within visually attractive regions. Extensive experiments show that our multi-task approach achieves superior performance in both VQA and VSP tasks.
AB - Short-form video, as a mainstream media form on video platforms, has undergone explosive growth in recent years. A vast number of short-form videos are produced, processed, and distributed to users each day, inevitably leading to quality degradation. Therefore, accurate video quality assessment (VQA) is critical for monitoring and optimizing the viewing experience of users. However, the existing short-form VQA approaches neglect human attention patterns during the viewing of videos. Besides, the advancement of short-form VQA is obstructed by the absence of large-scale datasets. To tackle the above challenges, we first construct a large-scale short-form VQA dataset called SVQA. The SVQA dataset comprises diverse distortion types, covering the typical quality degradations that arise during the photography, encoding, and editing of short-form videos. Besides, for each short-form video in SVQA, we collect both quality score and eye-tracking annotation. Based on our dataset, we propose a two-branch multi-task VQA approach, MT-VQA, in which both tasks of VQA and video saliency prediction (VSP) can be accomplished for short-form videos. We further propose a saliency fusion module to guide the VQA branch to focus on quality distortions within visually attractive regions. Extensive experiments show that our multi-task approach achieves superior performance in both VQA and VSP tasks.
KW - human attention
KW - short-form video
KW - video quality assessment
UR - https://www.scopus.com/pages/publications/85210526243
U2 - 10.1145/3689093.3689181
DO - 10.1145/3689093.3689181
M3 - 会议稿件
AN - SCOPUS:85210526243
T3 - QoEVMA 2024 - Proceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications
SP - 30
EP - 38
BT - QoEVMA 2024 - Proceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -