Skip to main navigation Skip to search Skip to main content

AUV Efficient Navigation Relying on Adaptive Proximal Policy Optimization

  • Jingzehua Xu
  • , Yongming Zeng
  • , Jintao Zhang
  • , Xuanchen Li
  • , Lingru Meng
  • , Haocai Huang
  • , Jingjing Wang*
  • , Yong Ren
  • *Corresponding author for this work
  • Tsinghua University
  • Zhejiang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Safe and efficient navigation is crucial for autonomous underwater vehicles (AUVs) to perform various marine monitoring tasks. Considering the complex and unknown underwater environment and limited sensing ability of AUVs, the traditional methods based on models and relying on large amounts of input information are not practical enough, and reinforcement learning (RL) has been widely discussed as one of the most promising schemes. Among many RL algorithms, the proximal policy optimization (PPO) based on trust region optimization theory not only improves sampling efficiency but also reduces deployment complexity by constraining updates of current and previous policies within an alternate trust region. Nevertheless, the performance of PPO is easily influenced by fixed clipping bounds and lacks adaptability. In order to dynamically optimize clipping bounds, we propose the adaptive PPO (APPO) algorithm for AUV navigation tasks. APPO dynamically explores and exploits clipping bounds during online training using a bandit to maximize the value of the upper confidence bound of each candidate boundary, guiding PPO to use different clipping bounds at different stages of online training to improve training efficiency and stability. Extensive simulation experiments demonstrate that APPO is more suitable for AUV navigation tasks compared to other baseline algorithms, showing superior performance in terms of robustness, stability, and adaptability.

Original languageEnglish
Title of host publicationNeural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
EditorsMufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer
PublisherSpringer Science and Business Media Deutschland GmbH
Pages149-164
Number of pages16
ISBN (Print)9789819666058
DOIs
StatePublished - 2025
Event31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand
Duration: 2 Dec 20246 Dec 2024

Publication series

NameLecture Notes in Computer Science
Volume15296 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Neural Information Processing, ICONIP 2024
Country/TerritoryNew Zealand
CityAuckland
Period2/12/246/12/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 14 - Life Below Water
    SDG 14 Life Below Water

Keywords

  • Adaptive proximal policy optimization
  • Autonomous underwater vehicles
  • Efficient navigation
  • Multi-armed bandit

Fingerprint

Dive into the research topics of 'AUV Efficient Navigation Relying on Adaptive Proximal Policy Optimization'. Together they form a unique fingerprint.

Cite this