Skip to main navigation Skip to search Skip to main content

SSEmb: A Joint Structural and Semantic Embedding Framework for Mathematical Formula Retrieval

  • Ruyin Li
  • , Xiaoyu Chen*
  • *Corresponding author for this work
  • Beihang University
  • Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Formula retrieval is a core topic in Mathematical Information Retrieval. We propose SSEmb, a novel embedding framework capable of capturing both structural and semantic features of formulas. Structurally, we employ Graph Contrastive Learning to encode formulas represented as Shared-substructure Operator Graphs. To enhance structural diversity while preserving mathematical validity of these formula graphs, we introduce a novel graph data augmentation approach that leverages a substitution strategy. Semantically, we utilize Sentence-BERT to encode the surrounding text of formulas. Finally, for each query and its candidates, structural and semantic similarities are calculated separately and then fused through a weighted scheme. In the ARQMath-3 Formula Retrieval Task, SSEmb outperforms existing embedding-based methods by over 5 percentage points on P@10 and nDCG@10. Furthermore, SSEmb enhances the performance of all runs of other methods and achieves state-of-the-art results when combined with Approach0.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 48th European Conference on Information Retrieval, ECIR 2026, Proceedings
EditorsRicardo Campos, Adam Jatowt, Yanyan Lan, Mohammad Aliannejadi, Christine Bauer, Sean MacAvaney, Avishek Anand, Nan Bai, Masoud Mansoury, Zhaochun Ren, Suzan Verberne
PublisherSpringer Science and Business Media Deutschland GmbH
Pages282-291
Number of pages10
ISBN (Print)9783032212993
DOIs
StatePublished - 2026
Event48th European Conference on Information Retrieval, ECIR 2026 - Delft, Netherlands
Duration: 29 Mar 20262 Apr 2026

Publication series

NameLecture Notes in Computer Science
Volume16484 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference48th European Conference on Information Retrieval, ECIR 2026
Country/TerritoryNetherlands
CityDelft
Period29/03/262/04/26

Keywords

  • Formula Retrieval
  • Graph Contrastive Learning
  • Graph Data Augmentation
  • Mathematical Information Retrieval

Fingerprint

Dive into the research topics of 'SSEmb: A Joint Structural and Semantic Embedding Framework for Mathematical Formula Retrieval'. Together they form a unique fingerprint.

Cite this