Skip to main navigation Skip to search Skip to main content

Multi-Agent Reinforcement Learning for Solving Stackelberg Equilibrium in Dynamic Satellite Resource Allocation

  • Yifan Bo
  • , Dongyu Xu
  • , Shuo Zhang
  • , Jianyuan Wang
  • , Biao Leng*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-beam satellite, with its advantages of long-distance and all-weather coverage, is recognized as a crucial portion for the future communication system. However, the time-varying traffic demand and complicated satellite-ground link bring a new challenge to the effective utilization of multi-beam beam-hopping satellite resources. What's more, current beam resource allocation approaches overlook the natural asynchronous action of beam decision-making and assume to act simultaneously, resulting in the lack of cooperation between beam decisions. To tackle these challenges, we construct a satellite beam resource allocation model, taking into account illuminated cells, frequency bandwidth, power usage, and co-frequency interference. We propose a Traffic-Aware Multi-Step (TAMS) reinforcement learning algorithm for satellite resource allocation, which treats each beam as an agent and models the problem as a Markov Stackelberg game. Considering the complicated communication channel, a traffic-aware attention-based encoder is proposed to capture correlated features from the arrival traffic, traffic buffer and channel status. Simultaneously, we introduce a sequential multi-step action decoder that approximates a Stackelberg equilibrium between leader and follower beams. Simulation results demonstrate that our proposed approach efficiently allocates beam, frequency, and power resources while adapting to time-varying traffic demands. It outperforms baseline algorithms by enhancing throughput, reducing task delay, and ensuring delay fairness. We demonstrate the effectiveness of each component through ablation experiments and validate its generalization across different beam numbers.

Original languageEnglish
Pages (from-to)4734-4746
Number of pages13
JournalIEEE Transactions on Vehicular Technology
Volume75
Issue number3
DOIs
StatePublished - Mar 2026

Keywords

  • Multi-beam satellite
  • Stackelberg equilibrium
  • multi-agent reinforcement learning
  • resource allocation

Fingerprint

Dive into the research topics of 'Multi-Agent Reinforcement Learning for Solving Stackelberg Equilibrium in Dynamic Satellite Resource Allocation'. Together they form a unique fingerprint.

Cite this