跳到主要导航 跳到搜索 跳到主要内容

QSE: Mitigating LLM Hallucinations Through Query-Adaptive Saliency-Localized Activation Editing

  • Kewei Liao
  • , Tianbo Wang
  • , Fengxiang Yu*
  • *此作品的通讯作者
  • Beihang University
  • Beijing Institute of Aerospace Systems Engineering

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Large Language Models (LLMs) exhibit significant capabilities and are extensively applied in diverse critical domains: advanced question-answering systems, healthcare information analysis, etc. Nevertheless, LLMs are prone to hallucination, which refers to generating outputs that are factually incorrect or unsubstantiated. This issue engenders significant reliability risks in critical applications. Inference-time activation editing has emerged as a promising strategy to mitigate hallucination without the need for model retraining. However, existing methods often employ generalized criteria for selecting attention heads and apply editing strengths that lack query-specific adaptability, therefore leading to suboptimal hallucination correction. To address these limitations, we introduce Query-adaptive Saliency-localized Activation Editing (QSE), which comprises Gradient-guided Head Saliency Localization (GSL) and Query-specific Editing Necessity Estimation (QNE), to enhance the precision and contextual adaptability of LLM activation editing. Specifically, GSL first employs a gradient-based optimization process to quantify the differential saliency of attention heads concerning factual generation, thereby pinpointing critical attention heads for precise activation editing. Subsequently, QNE comprehensively perceives the input query’s knowledge semantics, and its lightweight estimator dynamically adjusts the editing strength for each head previously identified by GSL, thereby enabling highly adaptive and context-aware adjustments. Empirical evaluations on the LLaMA-3-8B-Instruct model using the TruthfulQA benchmark demonstrate that QSE achieves substantial improvements in model truthfulness, notably surpassing the baseline by 21.3% on the True*Info score.

源语言英语
主期刊名Generalizing from Limited Resources in the Open World - 3rd International Workshop, GLOW 2025, Held in Conjunction with IJCAI 2025, Proceedings
编辑Yuqing Ma, Jinyang Guo, Xianglong Liu, Xiaowei Zhao, Ruihao Gong, Ning Liu, Xuefei Ning
出版商Springer Science and Business Media Deutschland GmbH
56-73
页数18
ISBN(印刷版)9789819509874
DOI
出版状态已出版 - 2025
活动3rd International Workshop on Generalizing from Limited Resources in the Open World, GLOW 2025, Held in Conjunction with the International Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, 加拿大
期限: 16 8月 202522 8月 2025

出版系列

姓名Communications in Computer and Information Science
2640 CCIS
ISSN(印刷版)1865-0929
ISSN(电子版)1865-0937

会议

会议3rd International Workshop on Generalizing from Limited Resources in the Open World, GLOW 2025, Held in Conjunction with the International Joint Conference on Artificial Intelligence, IJCAI 2025
国家/地区加拿大
Montreal
时期16/08/2522/08/25

指纹

探究 'QSE: Mitigating LLM Hallucinations Through Query-Adaptive Saliency-Localized Activation Editing' 的科研主题。它们共同构成独一无二的指纹。

引用此