TY - GEN
T1 - A Retrieval-Augmented Framework for Tabular Interpretation with Large Language Model
AU - Yan, Mengyi
AU - Ren, Weilong
AU - Wang, Yaoshu
AU - Li, Jianxin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Relational tables on the web hold a vast amount of knowledge, and it is critical for machine learning models to capture the semantics of these tables such that the models can achieve good performance on table interpretation tasks, such as entity linking, column type annotation and relation extraction. However, it is very challenging for ML models to process a large amount of tables and/or retrieve inter-table context information from the tables. Instead, existing works usually rely on heavily engineered features, user-defined rules or pre-training corpus. In this work, we propose a unified Retrieval-Augmented Framework for tabular interpretation with Large language model (RAFL), a novel 2-step framework for addressing the table interpretation task. RAFL first adopts a graph-enhanced model to obtain the inter-table context information by retrieving schema-similar and topic-relevant tables from a large range of corpus; RAFL then conducts tabular interpretation learning by combining a light-weighted pre-ranking model with a re-ranking-based large language model. We verify the effectiveness of RAFL through extensive evaluations on 3 tabular interpretation tasks (including entity linking, column type annotation and relation extraction), where RAFL substantially outperforms existing methods on all tasks.
AB - Relational tables on the web hold a vast amount of knowledge, and it is critical for machine learning models to capture the semantics of these tables such that the models can achieve good performance on table interpretation tasks, such as entity linking, column type annotation and relation extraction. However, it is very challenging for ML models to process a large amount of tables and/or retrieve inter-table context information from the tables. Instead, existing works usually rely on heavily engineered features, user-defined rules or pre-training corpus. In this work, we propose a unified Retrieval-Augmented Framework for tabular interpretation with Large language model (RAFL), a novel 2-step framework for addressing the table interpretation task. RAFL first adopts a graph-enhanced model to obtain the inter-table context information by retrieving schema-similar and topic-relevant tables from a large range of corpus; RAFL then conducts tabular interpretation learning by combining a light-weighted pre-ranking model with a re-ranking-based large language model. We verify the effectiveness of RAFL through extensive evaluations on 3 tabular interpretation tasks (including entity linking, column type annotation and relation extraction), where RAFL substantially outperforms existing methods on all tasks.
UR - https://www.scopus.com/pages/publications/85218214893
U2 - 10.1007/978-981-97-5779-4_23
DO - 10.1007/978-981-97-5779-4_23
M3 - 会议稿件
AN - SCOPUS:85218214893
SN - 9789819757787
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 341
EP - 356
BT - Database Systems for Advanced Applications - 29th International Conference, DASFAA 2024, Proceedings
A2 - Onizuka, Makoto
A2 - Lee, Jae-Gil
A2 - Tong, Yongxin
A2 - Xiao, Chuan
A2 - Ishikawa, Yoshiharu
A2 - Lu, Kejing
A2 - Amer-Yahia, Sihem
A2 - Jagadish, H.V.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Database Systems for Advanced Applications, DASFAA 2024
Y2 - 2 July 2024 through 5 July 2024
ER -