Abstract
Value function decomposition has achieved notable success in Multi-Agent Reinforcement Learning (MARL) under the centralized training with decentralized execution paradigm. Traditional value function decomposition methods typically employ monotonic mixing networks to decompose the optimal joint action-value function in order to ensure consistency between joint and local action selections. However, these networks often face limitations in representational capacity and sample efficiency, making it difficult to accurately fit the reward function and achieve stable convergence, thus leading to suboptimal results. To address these challenges, we propose a novel MARL framework called Hierarchical Residual Q-network (HRQ). The HRQ framework adheres to the Individual-Global-Max principle while applying more relaxed constraints. It features an Outer Residual Network (ORN) that adjusts the joint action-value function to enhance the representational capacity of the mixing network. Additionally, HRQ incorporates an Inner Residual Entropy Auxiliary Network (IREAN) to refine individual action-value functions, addressing credit assignment and value overestimation problems arising from task diversity and agent independence in MARL. Our approach enhances exploration efficiency, sample efficiency, and convergence stability. Extensive experiments on multi-agent cooperative benchmarks, including predator-prey and StarCraft, demonstrate that HRQ outperforms existing methods in convergence speed, stability, and adaptability. Compared with the best comparison method, HRQ achieves an overall performance improvement of 10 %–20 %.
| Original language | English |
|---|---|
| Article number | 131340 |
| Journal | Neurocomputing |
| Volume | 655 |
| DOIs | |
| State | Published - 28 Nov 2025 |
Keywords
- Centralized training with decentralized Execution
- Deep learning
- Multi-agent reinforcement learning
- Value function factorization
Fingerprint
Dive into the research topics of 'Factorizing value function with hierarchical residual Q-network in multi-agent reinforcement learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver