Skip to main navigation Skip to search Skip to main content

Factorizing value function with hierarchical residual Q-network in multi-agent reinforcement learning

  • Fang Gao
  • , Yunxiang Cai
  • , Haotian Yao
  • , Shaodong Li
  • , Qing Gao
  • , Linfei Yin*
  • *Corresponding author for this work
  • Guangxi University

Research output: Contribution to journalArticlepeer-review

Abstract

Value function decomposition has achieved notable success in Multi-Agent Reinforcement Learning (MARL) under the centralized training with decentralized execution paradigm. Traditional value function decomposition methods typically employ monotonic mixing networks to decompose the optimal joint action-value function in order to ensure consistency between joint and local action selections. However, these networks often face limitations in representational capacity and sample efficiency, making it difficult to accurately fit the reward function and achieve stable convergence, thus leading to suboptimal results. To address these challenges, we propose a novel MARL framework called Hierarchical Residual Q-network (HRQ). The HRQ framework adheres to the Individual-Global-Max principle while applying more relaxed constraints. It features an Outer Residual Network (ORN) that adjusts the joint action-value function to enhance the representational capacity of the mixing network. Additionally, HRQ incorporates an Inner Residual Entropy Auxiliary Network (IREAN) to refine individual action-value functions, addressing credit assignment and value overestimation problems arising from task diversity and agent independence in MARL. Our approach enhances exploration efficiency, sample efficiency, and convergence stability. Extensive experiments on multi-agent cooperative benchmarks, including predator-prey and StarCraft, demonstrate that HRQ outperforms existing methods in convergence speed, stability, and adaptability. Compared with the best comparison method, HRQ achieves an overall performance improvement of 10 %–20 %.

Original languageEnglish
Article number131340
JournalNeurocomputing
Volume655
DOIs
StatePublished - 28 Nov 2025

Keywords

  • Centralized training with decentralized Execution
  • Deep learning
  • Multi-agent reinforcement learning
  • Value function factorization

Fingerprint

Dive into the research topics of 'Factorizing value function with hierarchical residual Q-network in multi-agent reinforcement learning'. Together they form a unique fingerprint.

Cite this