Cross-modal topic correlations for multimedia retrieval

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose a novel approach for cross-modal multimedia retrieval by jointly modeling the text and image components of multimedia documents. In this model, the image component is represented by local SIFT descriptors based on the bag-of-feature model. The text component is represented by a topic distribution learned from latent topic models such as latent Dirichlet allocation (LDA). The latent semantic relations between texts and images can be reflected by correlations between the word topics and topics of image features. A statistical correlation model conditioned on category information is investigated. Experimental results on a benchmark Wikipedia dataset show that the newly proposed approach outperforms state-of-the-art cross-modal multimedia retrieval systems.

Original languageEnglish
Title of host publicationICPR 2012 - 21st International Conference on Pattern Recognition
Pages246-249
Number of pages4
StatePublished - 2012
Event21st International Conference on Pattern Recognition, ICPR 2012 - Tsukuba, Japan
Duration: 11 Nov 201215 Nov 2012

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Conference

Conference21st International Conference on Pattern Recognition, ICPR 2012
Country/TerritoryJapan
CityTsukuba
Period11/11/1215/11/12

Fingerprint

Dive into the research topics of 'Cross-modal topic correlations for multimedia retrieval'. Together they form a unique fingerprint.

Cite this