Abstract
Safe and efficient navigation is crucial for autonomous underwater vehicles (AUVs) to perform various marine monitoring tasks. Considering the complex and unknown underwater environment and limited sensing ability of AUVs, the traditional methods based on models and relying on large amounts of input information are not practical enough, and reinforcement learning (RL) has been widely discussed as one of the most promising schemes. Among many RL algorithms, the proximal policy optimization (PPO) based on trust region optimization theory not only improves sampling efficiency but also reduces deployment complexity by constraining updates of current and previous policies within an alternate trust region. Nevertheless, the performance of PPO is easily influenced by fixed clipping bounds and lacks adaptability. In order to dynamically optimize clipping bounds, we propose the adaptive PPO (APPO) algorithm for AUV navigation tasks. APPO dynamically explores and exploits clipping bounds during online training using a bandit to maximize the value of the upper confidence bound of each candidate boundary, guiding PPO to use different clipping bounds at different stages of online training to improve training efficiency and stability. Extensive simulation experiments demonstrate that APPO is more suitable for AUV navigation tasks compared to other baseline algorithms, showing superior performance in terms of robustness, stability, and adaptability.
| Original language | English |
|---|---|
| Title of host publication | Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings |
| Editors | Mufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 149-164 |
| Number of pages | 16 |
| ISBN (Print) | 9789819666058 |
| DOIs | |
| State | Published - 2025 |
| Event | 31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand Duration: 2 Dec 2024 → 6 Dec 2024 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15296 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 31st International Conference on Neural Information Processing, ICONIP 2024 |
|---|---|
| Country/Territory | New Zealand |
| City | Auckland |
| Period | 2/12/24 → 6/12/24 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 14 Life Below Water
Keywords
- Adaptive proximal policy optimization
- Autonomous underwater vehicles
- Efficient navigation
- Multi-armed bandit
Fingerprint
Dive into the research topics of 'AUV Efficient Navigation Relying on Adaptive Proximal Policy Optimization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver