跳到主要导航 跳到搜索 跳到主要内容

Image Caption Method Combining Multi-angle with Multi-modality

  • Qilu University of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Image caption generation technology has attracted the interest of researchers due to its wide application in practical applications. It involves two vital areas of artificial intelligence: image processing and natural language processing. The existing method is to predict the generation of the next word based on the image features and the words generated in the previous state. However, it ignores the important role of text information. In this paper, we propose an image caption generation method that combines multi-angle with multimodality. The model firstly uses the fusion features of the global and local images as input. Picture description of the first sentence is generated using the baseline encoding-decoding model. The image caption which is generated firstly is then input into the sentence encoding network to generate a semantic feature vector of the first sentence. Then, the local visual feature vector of the image and the semantic eigenvector of the first sentence which are two different modal features, are combined and input into the attention-based language generation model to generate the next sentence. This allows our model to generate multi-angle descriptions in a targeted manner.

源语言英语
主期刊名2019 IEEE 11th International Conference on Advanced Infocomm Technology, ICAIT 2019
出版商Institute of Electrical and Electronics Engineers Inc.
24-30
页数7
ISBN(电子版)9781728147789
DOI
出版状态已出版 - 10月 2019
已对外发布
活动11th IEEE International Conference on Advanced Infocomm Technology, ICAIT 2019 - Jinan, 中国
期限: 18 10月 201920 10月 2019

出版系列

姓名2019 IEEE 11th International Conference on Advanced Infocomm Technology, ICAIT 2019

会议

会议11th IEEE International Conference on Advanced Infocomm Technology, ICAIT 2019
国家/地区中国
Jinan
时期18/10/1920/10/19

指纹

探究 'Image Caption Method Combining Multi-angle with Multi-modality' 的科研主题。它们共同构成独一无二的指纹。

引用此