跳到主要导航 跳到搜索 跳到主要内容

Bidirectional Maximum Entropy Training with Word Co-Occurrence for Video Captioning

  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

Video captioning aims to generate natural language descriptions for a given video, which is a more challenging task than static image captioning since it requires a more diverse and exhaustive result. Meanwhile, it is also important that the generated captions should be consistent with the language habits of people at a fine granularity. In this work, unlike most recent works enhancing performance with additional data modalities or complex model designs, we focus on optimizing the training process of video captioning models. Firstly, to generate a more diverse video caption, we propose the bidirectional maximum entropy (BME) training, which directly optimizes the probability distribution of neighboring words under a reinforcement learning (RL) framework. Secondly, to search for more human-like captions in the larger search space created by BME, we introduce the word co-occurrence (WCO) weighting. It adaptively guides RL algorithms with co-occurrence statistics in the training corpus. Our method can be deployed on existing captioning models in a plug-and-play manner without introducing any extra parameters. Experimental results show that our method yields up to 5.8% and 7.0% improvements considering the CIDEr score on MSVD and MSR-VTT, respectively.

源语言英语
页(从-至)4494-4507
页数14
期刊IEEE Transactions on Multimedia
25
DOI
出版状态已出版 - 2023

指纹

探究 'Bidirectional Maximum Entropy Training with Word Co-Occurrence for Video Captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此