Skip to main navigation Skip to search Skip to main content

Visual-audio correspondence and its effect on video tipping: Evidence from Bilibili vlogs

  • Beihang University

Research output: Contribution to journalArticlepeer-review

Abstract

Video tipping takes a remarkable share in the income of online streaming platforms such as Bilibili. There are some specific mappings between the audio and visual signals that viewers can sense (e.g., congruency of pitch and size), which is generally called visual-audio correspondence (VAC). And it is believed to influence viewer satisfaction with video clips. The way to automatically measure VAC, however, still remains missing and its possible effect on video tipping is rarely examined in previous efforts. In this study, a deep neural network with two sub-networks, namely VAC-Net, is established to map both visual and audio stimuli into a shared embedding space. And the Euclidean distance between visual and audio representations in this space is accordingly presented to be the indicator of VAC. Pre-trained models of both modalities and the triplet loss are further leveraged to train the VAC-Net and it competently evaluates VAC of video clips with a test accuracy of 68.37% by outperforming alternative baselines and even exceeding humans on the similar task. Lab-experiments further show that the VAC measurement of VAC-Net conforms to human cognition. Second, considering that viewers’ tipping behavior (TIP) on videos is consistent with the pricing strategy Pay What You Want (PWYW), it is hypothesized that VAC would indirectly influence TIP by reshaping viewer satisfaction (VS). Regression models are thus built to test the hypotheses and it is found that VAC can promote TIP by enhancing VS significantly. Additional tests also demonstrate the robustness of this mechanism by considering various controls and measurement errors. Our results supplement PWYW in streaming videos with a new motive of VAC for viewer tipping and provide streaming practitioners with an automatic tool to estimate the tips videos will receive.

Original languageEnglish
Article number103347
JournalInformation Processing and Management
Volume60
Issue number3
DOIs
StatePublished - May 2023

Keywords

  • Consumer satisfaction
  • Deep neural network
  • Pay What You Want
  • Streaming videos
  • Tipping behavior
  • Visual-audio correspondence

Fingerprint

Dive into the research topics of 'Visual-audio correspondence and its effect on video tipping: Evidence from Bilibili vlogs'. Together they form a unique fingerprint.

Cite this