TY - GEN
T1 - Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles
AU - Li, Meng
AU - Cai, Kaiquan
AU - Zhao, Peng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - With a surge demand for instant gratification in online-shopping, offering same-day delivery with heterogeneous fleets of drones and vehicles provides new insights for decision makers. However, decisions in real-time involving assignment and routing of vehicles and drones suffer “curse of dimensionality”, due to stochastic and dynamic orders, huge state spaces as well as associated and diverse decisions. In this paper, a deep reinforcement learning (DRL) based approach is presented to handle this dynamic decision problem. First, a routed-based Markov decision process is formulated to model the problem. Besides, a DRL-based algorithm combining proximal policy optimization and heuristics (PPOh) is developed to decide whether to accept customer requests, how to assign orders and plan routes of fleets. Evaluation on extensive computational experiments shows that PPOh outperforms the extant methods and evidently improves service rates of fleets under the same workload.
AB - With a surge demand for instant gratification in online-shopping, offering same-day delivery with heterogeneous fleets of drones and vehicles provides new insights for decision makers. However, decisions in real-time involving assignment and routing of vehicles and drones suffer “curse of dimensionality”, due to stochastic and dynamic orders, huge state spaces as well as associated and diverse decisions. In this paper, a deep reinforcement learning (DRL) based approach is presented to handle this dynamic decision problem. First, a routed-based Markov decision process is formulated to model the problem. Besides, a DRL-based algorithm combining proximal policy optimization and heuristics (PPOh) is developed to decide whether to accept customer requests, how to assign orders and plan routes of fleets. Evaluation on extensive computational experiments shows that PPOh outperforms the extant methods and evidently improves service rates of fleets under the same workload.
KW - Deep Reinforcement Learning
KW - Proximal Policy Optimization
KW - Route-based MDP
KW - Same-day Delivery
UR - https://www.scopus.com/pages/publications/85187662871
U2 - 10.1007/978-981-97-0837-6_15
DO - 10.1007/978-981-97-0837-6_15
M3 - 会议稿件
AN - SCOPUS:85187662871
SN - 9789819708369
T3 - Communications in Computer and Information Science
SP - 211
EP - 224
BT - Data Mining and Big Data - 8th International Conference, DMBD 2023, Proceedings
A2 - Tan, Ying
A2 - Shi, Yuhui
PB - Springer Science and Business Media Deutschland GmbH
T2 - 8th International Conference on Data Mining and Big Data, DMBD 2023
Y2 - 9 December 2023 through 12 December 2023
ER -