TY - GEN
T1 - Online cross-modal scene retrieval by binary representation and semantic graph
AU - Qi, Mengshi
AU - Wang, Yunhong
AU - Li, Annan
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/10/23
Y1 - 2017/10/23
N2 - In recent years, cross-modal scene retrieval has atracted more atention. However, most existing approaches neglect the semantic relationship between objects in a scene together with the embedded spatial layouts. Moreover, these methods mostly apply the batch learning strategy, which is not suitable for processing streaming data. To address the aforementioned problems, we propose a new framework for online cross-modal scene retrieval based on binary representations and semantic graph. Specially, we adopt the cross-modal hashing based on the quantization loss of different modalities. By introducing the semantic graph, we are able to extract wealthy semantics and measure their correlation across different modalities. Further more, we propose a two-step optimization procedure based on stochastic gradient descent for online update. Experimental results on four datasets show the superiority of our approach over the state-of-the-art.
AB - In recent years, cross-modal scene retrieval has atracted more atention. However, most existing approaches neglect the semantic relationship between objects in a scene together with the embedded spatial layouts. Moreover, these methods mostly apply the batch learning strategy, which is not suitable for processing streaming data. To address the aforementioned problems, we propose a new framework for online cross-modal scene retrieval based on binary representations and semantic graph. Specially, we adopt the cross-modal hashing based on the quantization loss of different modalities. By introducing the semantic graph, we are able to extract wealthy semantics and measure their correlation across different modalities. Further more, we propose a two-step optimization procedure based on stochastic gradient descent for online update. Experimental results on four datasets show the superiority of our approach over the state-of-the-art.
KW - Binary representations
KW - Cross-modal hashing
KW - Online learning
KW - Scene retrieval
KW - Semantic-graph
UR - https://www.scopus.com/pages/publications/85035213639
U2 - 10.1145/3123266.3123311
DO - 10.1145/3123266.3123311
M3 - 会议稿件
AN - SCOPUS:85035213639
T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
SP - 744
EP - 752
BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 25th ACM International Conference on Multimedia, MM 2017
Y2 - 23 October 2017 through 27 October 2017
ER -