TY - GEN
T1 - Discriminate Cross-modal Quantization for Efficient Retrieval
AU - Sun, Peng
AU - Yan, Cheng
AU - Wang, Shuai
AU - Bai, Xiao
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - Efficient cross-modal retrieval involves searching similar items across different modalities, e.g., using an image(text) to search for texts(images). To speed up cross-modal retrieval, hashing-based methods threshold continuous embeddings into binary codes, inducing substantial loss of accuracy retrieval. To further improve retrieval performance, several quantization-based methods quantize embeddings into real-valued codewords to maximumlly preserve inter-modal and intra-modal similarity relation, while the discrimination between dissimilar data is ignored. To address these challenges, we propose, for the first time, a novel discriminate cross-modal quantization(DCMQ) which nonlinearly maps different modalities into a common space where ir-relevant data points are semantically separable: The points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes. An effective optimization algorithm is developed for the proposed method to jointly learn the modality-specific mapping functions, the sharing codebooks, the unified binary codes and a linear classifier. Experimental comparison with state-of-the-art algorithms over three benchmark datasets demonstrates that DCMQ achieves significant improvement in search accuracy.
AB - Efficient cross-modal retrieval involves searching similar items across different modalities, e.g., using an image(text) to search for texts(images). To speed up cross-modal retrieval, hashing-based methods threshold continuous embeddings into binary codes, inducing substantial loss of accuracy retrieval. To further improve retrieval performance, several quantization-based methods quantize embeddings into real-valued codewords to maximumlly preserve inter-modal and intra-modal similarity relation, while the discrimination between dissimilar data is ignored. To address these challenges, we propose, for the first time, a novel discriminate cross-modal quantization(DCMQ) which nonlinearly maps different modalities into a common space where ir-relevant data points are semantically separable: The points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes. An effective optimization algorithm is developed for the proposed method to jointly learn the modality-specific mapping functions, the sharing codebooks, the unified binary codes and a linear classifier. Experimental comparison with state-of-the-art algorithms over three benchmark datasets demonstrates that DCMQ achieves significant improvement in search accuracy.
UR - https://www.scopus.com/pages/publications/85059781560
U2 - 10.1109/ICPR.2018.8545540
DO - 10.1109/ICPR.2018.8545540
M3 - 会议稿件
AN - SCOPUS:85059781560
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3328
EP - 3334
BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th International Conference on Pattern Recognition, ICPR 2018
Y2 - 20 August 2018 through 24 August 2018
ER -