跳到主要导航 跳到搜索 跳到主要内容

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

  • Xianjie Wu
  • , Jian Yang*
  • , Linzheng Chai
  • , Ge Zhang
  • , Jiaheng Liu
  • , Xeron Du
  • , Di Liang
  • , Daixin Shu
  • , Xianfu Cheng
  • , Tianzhen Sun
  • , Tongliang Li
  • , Zhoujun Li*
  • , Guanglin Niu
  • *此作品的通讯作者
  • Beihang University
  • M-A-P
  • Fudan University
  • Beijing Information Science & Technology University

科研成果: 期刊稿件会议文章同行评审

摘要

Recent advancements in large language models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TABLELLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance with GPT-3.5. Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands, where the most advanced model, GPT-4, achieves only a modest score compared to humans.

源语言英语
页(从-至)25497-25506
页数10
期刊Proceedings of the AAAI Conference on Artificial Intelligence
39
24
DOI
出版状态已出版 - 11 4月 2025
活动39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, 美国
期限: 25 2月 20254 3月 2025

指纹

探究 'TableBench: A Comprehensive and Complex Benchmark for Table Question Answering' 的科研主题。它们共同构成独一无二的指纹。

引用此