跳到主要导航 跳到搜索 跳到主要内容

Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation

  • Zijie Zhong
  • , Hanwen Liu
  • , Xiaoya Cui
  • , Xiaofan Zhang*
  • , Zengchang Qin*
  • *此作品的通讯作者
  • Shanghai Artificial Intelligence Laboratory
  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Integrating information from various reference databases is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows different conventions. Retrieving from multiple knowledge sources with one fixed strategy usually leads to under-exploitation of information. To mitigate this drawback, inspired by Mix-of-Expert, we introduce Mix-of-Granularity (MoG), a method that dynamically determines the optimal granularity of a knowledge source based on input queries using a router. The router is efficiently trained with a newly proposed loss function employing soft labels. We further extend MoG to MoG-Graph (MoGG), where reference documents are pre-processed as graphs, enabling the retrieval of distantly situated snippets. Experiments demonstrate that MoG and MoGG effectively predict optimal granularity levels, significantly enhancing the performance of the RAG system in downstream tasks. The code of both MoG and MoGG will be made public.

源语言英语
主期刊名Main Conference
编辑Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
出版商Association for Computational Linguistics (ACL)
5756-5774
页数19
ISBN(电子版)9798891761964
出版状态已出版 - 2025
活动31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, 阿拉伯联合酋长国
期限: 19 1月 202524 1月 2025

出版系列

姓名Proceedings - International Conference on Computational Linguistics, COLING
ISSN(印刷版)2951-2093

会议

会议31st International Conference on Computational Linguistics, COLING 2025
国家/地区阿拉伯联合酋长国
Abu Dhabi
时期19/01/2524/01/25

指纹

探究 'Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此