TY - GEN
T1 - Semantic Modeling of Textual Relationships in Cross-modal Retrieval
AU - Yu, Jing
AU - Yang, Chenghao
AU - Qin, Zengchang
AU - Yang, Zhuoqian
AU - Hu, Yue
AU - Shi, Zhiguo
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relationships, statistical co-occurrence, and prior relationships in knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal similarity measure is learned by distance metric learning. Experimental results show that, by leveraging the rich relational semantics in texts, our model can outperform the state-of-the-art models by 3.4% on 6.3% in accuracy on two benchmark datasets.
AB - Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relationships, statistical co-occurrence, and prior relationships in knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal similarity measure is learned by distance metric learning. Experimental results show that, by leveraging the rich relational semantics in texts, our model can outperform the state-of-the-art models by 3.4% on 6.3% in accuracy on two benchmark datasets.
KW - Cross-modal retrieval
KW - Graph Convolutional Network
KW - Knowledge graph
KW - Relationship integration
KW - Textual relationships
UR - https://www.scopus.com/pages/publications/85081574459
U2 - 10.1007/978-3-030-29551-6_3
DO - 10.1007/978-3-030-29551-6_3
M3 - 会议稿件
AN - SCOPUS:85081574459
SN - 9783030295509
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 24
EP - 32
BT - Knowledge Science, Engineering and Management - 12th International Conference, KSEM 2019, Proceedings
A2 - Douligeris, Christos
A2 - Apostolou, Dimitris
A2 - Karagiannis, Dimitris
PB - Springer
T2 - 12th International Conference on Knowledge Science, Engineering and Management, KSEM 2019
Y2 - 28 August 2019 through 30 August 2019
ER -