TY - GEN
T1 - SSEmb
T2 - 48th European Conference on Information Retrieval, ECIR 2026
AU - Li, Ruyin
AU - Chen, Xiaoyu
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Formula retrieval is a core topic in Mathematical Information Retrieval. We propose SSEmb, a novel embedding framework capable of capturing both structural and semantic features of formulas. Structurally, we employ Graph Contrastive Learning to encode formulas represented as Shared-substructure Operator Graphs. To enhance structural diversity while preserving mathematical validity of these formula graphs, we introduce a novel graph data augmentation approach that leverages a substitution strategy. Semantically, we utilize Sentence-BERT to encode the surrounding text of formulas. Finally, for each query and its candidates, structural and semantic similarities are calculated separately and then fused through a weighted scheme. In the ARQMath-3 Formula Retrieval Task, SSEmb outperforms existing embedding-based methods by over 5 percentage points on P′@10 and nDCG′@10. Furthermore, SSEmb enhances the performance of all runs of other methods and achieves state-of-the-art results when combined with Approach0.
AB - Formula retrieval is a core topic in Mathematical Information Retrieval. We propose SSEmb, a novel embedding framework capable of capturing both structural and semantic features of formulas. Structurally, we employ Graph Contrastive Learning to encode formulas represented as Shared-substructure Operator Graphs. To enhance structural diversity while preserving mathematical validity of these formula graphs, we introduce a novel graph data augmentation approach that leverages a substitution strategy. Semantically, we utilize Sentence-BERT to encode the surrounding text of formulas. Finally, for each query and its candidates, structural and semantic similarities are calculated separately and then fused through a weighted scheme. In the ARQMath-3 Formula Retrieval Task, SSEmb outperforms existing embedding-based methods by over 5 percentage points on P′@10 and nDCG′@10. Furthermore, SSEmb enhances the performance of all runs of other methods and achieves state-of-the-art results when combined with Approach0.
KW - Formula Retrieval
KW - Graph Contrastive Learning
KW - Graph Data Augmentation
KW - Mathematical Information Retrieval
UR - https://www.scopus.com/pages/publications/105035366539
U2 - 10.1007/978-3-032-21300-6_18
DO - 10.1007/978-3-032-21300-6_18
M3 - 会议稿件
AN - SCOPUS:105035366539
SN - 9783032212993
T3 - Lecture Notes in Computer Science
SP - 282
EP - 291
BT - Advances in Information Retrieval - 48th European Conference on Information Retrieval, ECIR 2026, Proceedings
A2 - Campos, Ricardo
A2 - Jatowt, Adam
A2 - Lan, Yanyan
A2 - Aliannejadi, Mohammad
A2 - Bauer, Christine
A2 - MacAvaney, Sean
A2 - Anand, Avishek
A2 - Bai, Nan
A2 - Mansoury, Masoud
A2 - Ren, Zhaochun
A2 - Verberne, Suzan
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 March 2026 through 2 April 2026
ER -