TY - GEN
T1 - EFFICIENT FINE-GRAINED VISUAL-TEXT SEARCH USING ADVERSARIALLY-LEARNED HASH CODES
AU - Li, Yongzhi
AU - Mu, Yadong
AU - Zhuang, Nan
AU - Liu, Xianglong
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Cross-modal hashing for efficient visual-text search has attracted much research enthusiasm in recent years. The main argument of this work is that existing hashing methods mainly exploit a multi-label matching paradigm, ignoring various fine-grained semantics (high-order relationships, object attributes, etc.) in the multi-modal data. This paper explores cross-modal hashing from two rarely-explored aspects: first, we propose an efficient two-step hashing scheme that quickly screens irrelevant samples with global feature and then generate fine-grained feature guided by high-order concepts to rerank the survived candidates. Secondly, the robustness of the cross-modal hashing model, particularly under subtle tampering of fine-grained queries, is formally investigated. We propose a rephrase and adversarial training strategy for obtaining better performance and robustness. Comprehensive experiments and ablation studies on two large public datasets (MS-COCO and Flickr30K) demonstrate the proposed method's superiority in terms of both efficiency and accuracy.
AB - Cross-modal hashing for efficient visual-text search has attracted much research enthusiasm in recent years. The main argument of this work is that existing hashing methods mainly exploit a multi-label matching paradigm, ignoring various fine-grained semantics (high-order relationships, object attributes, etc.) in the multi-modal data. This paper explores cross-modal hashing from two rarely-explored aspects: first, we propose an efficient two-step hashing scheme that quickly screens irrelevant samples with global feature and then generate fine-grained feature guided by high-order concepts to rerank the survived candidates. Secondly, the robustness of the cross-modal hashing model, particularly under subtle tampering of fine-grained queries, is formally investigated. We propose a rephrase and adversarial training strategy for obtaining better performance and robustness. Comprehensive experiments and ablation studies on two large public datasets (MS-COCO and Flickr30K) demonstrate the proposed method's superiority in terms of both efficiency and accuracy.
KW - Adversarial Learning
KW - Cross-modal Retrieval
KW - Fine-grained Search
KW - Hashing
UR - https://www.scopus.com/pages/publications/85126474744
U2 - 10.1109/ICME51207.2021.9428271
DO - 10.1109/ICME51207.2021.9428271
M3 - 会议稿件
AN - SCOPUS:85126474744
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PB - IEEE Computer Society
T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Y2 - 5 July 2021 through 9 July 2021
ER -