Skip to main navigation Skip to search Skip to main content

Query-Routed Activation Editing with Truth-hierarchical Preference Optimization

  • Beihang University
  • State Key Laboratory of Complex & Critical Software Environment
  • Zhongguancun Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hallucination has emerged as a pivotal challenge of Large Language Models (LLMs) that generate plausible yet non-factual content, significantly impeding the trustworthy AI applications in real-world scenarios like medical diagnosis and autonomous driving. Editing the internal activations of LLMs during inference has shown promising effectiveness in mitigating hallucinations with minimal cost. However, previous editing approaches neglect the query-specific inference pathways that require tailored truthful steering vectors, resulting in suboptimal hallucination mitigation. To address these issues, we propose the Query-Routed Activation Editing (QRAE) framework, which comprises Divergence-sensitive Head Routing (DHR) and Truth-hierarchical Preference Steering (TPS), to fully leverage query-specific semantics for adaptive activation editing. Specifically, DHR is proposed to establish a query-aware head selection criterion, thereby dynamically routing to truth-critical attention heads. Subsequently, TPS introduces a query-specific steering vector calibration policy with the guidance of progressive truth-preferred optimization, enabling precise and adaptive editing for each distinct query. Extensive experiments on the widely recognized TruthfulQA benchmark demonstrate that QRAE outperforms SOTA methods by up to 13.2% in MC1. Meanwhile, QRAE demonstrates strong generalization to out-of-distribution TriviaQA and Natural Questions benchmarks.

Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
EditorsSven Koenig, Chad Jenkins, Matthew E. Taylor
PublisherAssociation for the Advancement of Artificial Intelligence
Pages31979-31987
Number of pages9
Edition38
ISBN (Print)9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067, 9781577359067
DOIs
StatePublished - 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number38
Volume40
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference40th AAAI Conference on Artificial Intelligence, AAAI 2026
Country/TerritorySingapore
CitySingapore
Period20/01/2627/01/26

Fingerprint

Dive into the research topics of 'Query-Routed Activation Editing with Truth-hierarchical Preference Optimization'. Together they form a unique fingerprint.

Cite this