TY - GEN
T1 - Query-Routed Activation Editing with Truth-hierarchical Preference Optimization
AU - Liao, Kewei
AU - Wang, Tianbo
AU - Ma, Yuqing
AU - Zhang, Zhange
AU - Geng, Zhicheng
AU - Zhao, Xiaowei
AU - Wang, Jiakai
AU - Liu, Xianglong
N1 - Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2026
Y1 - 2026
N2 - Hallucination has emerged as a pivotal challenge of Large Language Models (LLMs) that generate plausible yet non-factual content, significantly impeding the trustworthy AI applications in real-world scenarios like medical diagnosis and autonomous driving. Editing the internal activations of LLMs during inference has shown promising effectiveness in mitigating hallucinations with minimal cost. However, previous editing approaches neglect the query-specific inference pathways that require tailored truthful steering vectors, resulting in suboptimal hallucination mitigation. To address these issues, we propose the Query-Routed Activation Editing (QRAE) framework, which comprises Divergence-sensitive Head Routing (DHR) and Truth-hierarchical Preference Steering (TPS), to fully leverage query-specific semantics for adaptive activation editing. Specifically, DHR is proposed to establish a query-aware head selection criterion, thereby dynamically routing to truth-critical attention heads. Subsequently, TPS introduces a query-specific steering vector calibration policy with the guidance of progressive truth-preferred optimization, enabling precise and adaptive editing for each distinct query. Extensive experiments on the widely recognized TruthfulQA benchmark demonstrate that QRAE outperforms SOTA methods by up to 13.2% in MC1. Meanwhile, QRAE demonstrates strong generalization to out-of-distribution TriviaQA and Natural Questions benchmarks.
AB - Hallucination has emerged as a pivotal challenge of Large Language Models (LLMs) that generate plausible yet non-factual content, significantly impeding the trustworthy AI applications in real-world scenarios like medical diagnosis and autonomous driving. Editing the internal activations of LLMs during inference has shown promising effectiveness in mitigating hallucinations with minimal cost. However, previous editing approaches neglect the query-specific inference pathways that require tailored truthful steering vectors, resulting in suboptimal hallucination mitigation. To address these issues, we propose the Query-Routed Activation Editing (QRAE) framework, which comprises Divergence-sensitive Head Routing (DHR) and Truth-hierarchical Preference Steering (TPS), to fully leverage query-specific semantics for adaptive activation editing. Specifically, DHR is proposed to establish a query-aware head selection criterion, thereby dynamically routing to truth-critical attention heads. Subsequently, TPS introduces a query-specific steering vector calibration policy with the guidance of progressive truth-preferred optimization, enabling precise and adaptive editing for each distinct query. Extensive experiments on the widely recognized TruthfulQA benchmark demonstrate that QRAE outperforms SOTA methods by up to 13.2% in MC1. Meanwhile, QRAE demonstrates strong generalization to out-of-distribution TriviaQA and Natural Questions benchmarks.
UR - https://www.scopus.com/pages/publications/105034605662
U2 - 10.1609/aaai.v40i38.40468
DO - 10.1609/aaai.v40i38.40468
M3 - 会议稿件
AN - SCOPUS:105034605662
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 31979
EP - 31987
BT - Proceedings of the AAAI Conference on Artificial Intelligence
A2 - Koenig, Sven
A2 - Jenkins, Chad
A2 - Taylor, Matthew E.
PB - Association for the Advancement of Artificial Intelligence
T2 - 40th AAAI Conference on Artificial Intelligence, AAAI 2026
Y2 - 20 January 2026 through 27 January 2026
ER -