TY - GEN
T1 - QSE
T2 - 3rd International Workshop on Generalizing from Limited Resources in the Open World, GLOW 2025, Held in Conjunction with the International Joint Conference on Artificial Intelligence, IJCAI 2025
AU - Liao, Kewei
AU - Wang, Tianbo
AU - Yu, Fengxiang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Large Language Models (LLMs) exhibit significant capabilities and are extensively applied in diverse critical domains: advanced question-answering systems, healthcare information analysis, etc. Nevertheless, LLMs are prone to hallucination, which refers to generating outputs that are factually incorrect or unsubstantiated. This issue engenders significant reliability risks in critical applications. Inference-time activation editing has emerged as a promising strategy to mitigate hallucination without the need for model retraining. However, existing methods often employ generalized criteria for selecting attention heads and apply editing strengths that lack query-specific adaptability, therefore leading to suboptimal hallucination correction. To address these limitations, we introduce Query-adaptive Saliency-localized Activation Editing (QSE), which comprises Gradient-guided Head Saliency Localization (GSL) and Query-specific Editing Necessity Estimation (QNE), to enhance the precision and contextual adaptability of LLM activation editing. Specifically, GSL first employs a gradient-based optimization process to quantify the differential saliency of attention heads concerning factual generation, thereby pinpointing critical attention heads for precise activation editing. Subsequently, QNE comprehensively perceives the input query’s knowledge semantics, and its lightweight estimator dynamically adjusts the editing strength for each head previously identified by GSL, thereby enabling highly adaptive and context-aware adjustments. Empirical evaluations on the LLaMA-3-8B-Instruct model using the TruthfulQA benchmark demonstrate that QSE achieves substantial improvements in model truthfulness, notably surpassing the baseline by 21.3% on the True*Info score.
AB - Large Language Models (LLMs) exhibit significant capabilities and are extensively applied in diverse critical domains: advanced question-answering systems, healthcare information analysis, etc. Nevertheless, LLMs are prone to hallucination, which refers to generating outputs that are factually incorrect or unsubstantiated. This issue engenders significant reliability risks in critical applications. Inference-time activation editing has emerged as a promising strategy to mitigate hallucination without the need for model retraining. However, existing methods often employ generalized criteria for selecting attention heads and apply editing strengths that lack query-specific adaptability, therefore leading to suboptimal hallucination correction. To address these limitations, we introduce Query-adaptive Saliency-localized Activation Editing (QSE), which comprises Gradient-guided Head Saliency Localization (GSL) and Query-specific Editing Necessity Estimation (QNE), to enhance the precision and contextual adaptability of LLM activation editing. Specifically, GSL first employs a gradient-based optimization process to quantify the differential saliency of attention heads concerning factual generation, thereby pinpointing critical attention heads for precise activation editing. Subsequently, QNE comprehensively perceives the input query’s knowledge semantics, and its lightweight estimator dynamically adjusts the editing strength for each head previously identified by GSL, thereby enabling highly adaptive and context-aware adjustments. Empirical evaluations on the LLaMA-3-8B-Instruct model using the TruthfulQA benchmark demonstrate that QSE achieves substantial improvements in model truthfulness, notably surpassing the baseline by 21.3% on the True*Info score.
KW - Adaptive Activation Editing
KW - Hallucination Mitigation
KW - Large Language Models
UR - https://www.scopus.com/pages/publications/105014477687
U2 - 10.1007/978-981-95-0988-1_5
DO - 10.1007/978-981-95-0988-1_5
M3 - 会议稿件
AN - SCOPUS:105014477687
SN - 9789819509874
T3 - Communications in Computer and Information Science
SP - 56
EP - 73
BT - Generalizing from Limited Resources in the Open World - 3rd International Workshop, GLOW 2025, Held in Conjunction with IJCAI 2025, Proceedings
A2 - Ma, Yuqing
A2 - Guo, Jinyang
A2 - Liu, Xianglong
A2 - Zhao, Xiaowei
A2 - Gong, Ruihao
A2 - Liu, Ning
A2 - Ning, Xuefei
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 16 August 2025 through 22 August 2025
ER -