TY - JOUR
T1 - Visual-audio correspondence and its effect on video tipping
T2 - Evidence from Bilibili vlogs
AU - Li, Bu
AU - Zhao, Jichang
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/5
Y1 - 2023/5
N2 - Video tipping takes a remarkable share in the income of online streaming platforms such as Bilibili. There are some specific mappings between the audio and visual signals that viewers can sense (e.g., congruency of pitch and size), which is generally called visual-audio correspondence (VAC). And it is believed to influence viewer satisfaction with video clips. The way to automatically measure VAC, however, still remains missing and its possible effect on video tipping is rarely examined in previous efforts. In this study, a deep neural network with two sub-networks, namely VAC-Net, is established to map both visual and audio stimuli into a shared embedding space. And the Euclidean distance between visual and audio representations in this space is accordingly presented to be the indicator of VAC. Pre-trained models of both modalities and the triplet loss are further leveraged to train the VAC-Net and it competently evaluates VAC of video clips with a test accuracy of 68.37% by outperforming alternative baselines and even exceeding humans on the similar task. Lab-experiments further show that the VAC measurement of VAC-Net conforms to human cognition. Second, considering that viewers’ tipping behavior (TIP) on videos is consistent with the pricing strategy Pay What You Want (PWYW), it is hypothesized that VAC would indirectly influence TIP by reshaping viewer satisfaction (VS). Regression models are thus built to test the hypotheses and it is found that VAC can promote TIP by enhancing VS significantly. Additional tests also demonstrate the robustness of this mechanism by considering various controls and measurement errors. Our results supplement PWYW in streaming videos with a new motive of VAC for viewer tipping and provide streaming practitioners with an automatic tool to estimate the tips videos will receive.
AB - Video tipping takes a remarkable share in the income of online streaming platforms such as Bilibili. There are some specific mappings between the audio and visual signals that viewers can sense (e.g., congruency of pitch and size), which is generally called visual-audio correspondence (VAC). And it is believed to influence viewer satisfaction with video clips. The way to automatically measure VAC, however, still remains missing and its possible effect on video tipping is rarely examined in previous efforts. In this study, a deep neural network with two sub-networks, namely VAC-Net, is established to map both visual and audio stimuli into a shared embedding space. And the Euclidean distance between visual and audio representations in this space is accordingly presented to be the indicator of VAC. Pre-trained models of both modalities and the triplet loss are further leveraged to train the VAC-Net and it competently evaluates VAC of video clips with a test accuracy of 68.37% by outperforming alternative baselines and even exceeding humans on the similar task. Lab-experiments further show that the VAC measurement of VAC-Net conforms to human cognition. Second, considering that viewers’ tipping behavior (TIP) on videos is consistent with the pricing strategy Pay What You Want (PWYW), it is hypothesized that VAC would indirectly influence TIP by reshaping viewer satisfaction (VS). Regression models are thus built to test the hypotheses and it is found that VAC can promote TIP by enhancing VS significantly. Additional tests also demonstrate the robustness of this mechanism by considering various controls and measurement errors. Our results supplement PWYW in streaming videos with a new motive of VAC for viewer tipping and provide streaming practitioners with an automatic tool to estimate the tips videos will receive.
KW - Consumer satisfaction
KW - Deep neural network
KW - Pay What You Want
KW - Streaming videos
KW - Tipping behavior
KW - Visual-audio correspondence
UR - https://www.scopus.com/pages/publications/85150773872
U2 - 10.1016/j.ipm.2023.103347
DO - 10.1016/j.ipm.2023.103347
M3 - 文章
AN - SCOPUS:85150773872
SN - 0306-4573
VL - 60
JO - Information Processing and Management
JF - Information Processing and Management
IS - 3
M1 - 103347
ER -