跳到主要导航 跳到搜索 跳到主要内容

TopicDVC: Dense Video Captioning with Topic Guidance

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Dense video captioning involves detecting and describing multiple events within a video coherently. Events within a video typically share a common topic, and incorporating this topic information into the model can enhance the quality and coherence of the generated captions. However, existing dense video captioning datasets lack explicit topic annotations. To address this, we design a topic generator that utilizes a diffusion model with strong generative capabilities to generate topic information from a given video. During the training stage, we use the features of ground-truth captions as pseudo-topic labels. The video topics diffuse from the features of groundtruth captions to a random distribution, and the model learns to reverse this noising process conditioned on video features. During inference, the model iteratively denoises the Gaussian noise into topic features conditioned on video features. In this paper, we propose TopicDVC, a novel dense video captioning framework. TopicDVC applies the topic information generated by the topic generator to guide the model in generating more coherent captions. Experiments on the ActivityNet Captions dataset demonstrate that leveraging the topics generated by the diffusion model significantly improves the performance of dense video captioning, producing more accurate and coherent captions.

源语言英语
主期刊名Proceedings - 2024 IEEE 10th International Conference on Edge Computing and Scalable Cloud, EdgeCom 2024
出版商Institute of Electrical and Electronics Engineers Inc.
82-87
页数6
ISBN(电子版)9798350377132
DOI
出版状态已出版 - 2024
活动10th IEEE International Conference on Edge Computing and Scalable Cloud, EdgeCom 2024 - Shanghai, 中国
期限: 28 6月 202430 6月 2024

出版系列

姓名Proceedings - 2024 IEEE 10th International Conference on Edge Computing and Scalable Cloud, EdgeCom 2024

会议

会议10th IEEE International Conference on Edge Computing and Scalable Cloud, EdgeCom 2024
国家/地区中国
Shanghai
时期28/06/2430/06/24

指纹

探究 'TopicDVC: Dense Video Captioning with Topic Guidance' 的科研主题。它们共同构成独一无二的指纹。

引用此