TY - JOUR
T1 - Hastening Stream Offloading of Inference via Multi-Exit DNNs in Mobile Edge Computing
AU - Liu, Zhicheng
AU - Song, Jinduo
AU - Qiu, Chao
AU - Wang, Xiaofei
AU - Chen, Xu
AU - He, Qiang
AU - Sheng, Hao
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - As the primary driver of intelligent mobile applications, deep neural networks (DNNs) have gradually deployed to millions of mobile devices, producing massive latency-sensitive and computation-intensive tasks daily. Mobile edge computing facilitates the deployment of computing resources at the edge, which enables fine-grained offloading of DNN inference tasks from mobile devices to edge nodes. However, most existing studies have not systematically considered three crucial performance aspects: scheduling multiple streams of DNN inference tasks, leveraging multi-exit models to hasten task processing, and partitioning inference models for partial offloading. To this end, this paper proposes an adaptive inference framework in mobile edge computing, which can dynamically select the exit point and partition point for multiple inference task streams. We design a dynamic programming algorithm to obtain an efficient solution under the ideal condition that task arrival information is known. Further, we design a learning-based algorithm for online scheduling, whose training efficiency is improved based on historical experience initialization and priority experience replay. Experimental results show that compared with the Greedy algorithm, the online algorithm improves the performance on two environmental parameters by an average of 5.9% and 32%, respectively.
AB - As the primary driver of intelligent mobile applications, deep neural networks (DNNs) have gradually deployed to millions of mobile devices, producing massive latency-sensitive and computation-intensive tasks daily. Mobile edge computing facilitates the deployment of computing resources at the edge, which enables fine-grained offloading of DNN inference tasks from mobile devices to edge nodes. However, most existing studies have not systematically considered three crucial performance aspects: scheduling multiple streams of DNN inference tasks, leveraging multi-exit models to hasten task processing, and partitioning inference models for partial offloading. To this end, this paper proposes an adaptive inference framework in mobile edge computing, which can dynamically select the exit point and partition point for multiple inference task streams. We design a dynamic programming algorithm to obtain an efficient solution under the ideal condition that task arrival information is known. Further, we design a learning-based algorithm for online scheduling, whose training efficiency is improved based on historical experience initialization and priority experience replay. Experimental results show that compared with the Greedy algorithm, the online algorithm improves the performance on two environmental parameters by an average of 5.9% and 32%, respectively.
KW - DNN inference
KW - Task offloading
KW - deep reinforcement learning
KW - edge computing
KW - model partition
UR - https://www.scopus.com/pages/publications/85141594090
U2 - 10.1109/TMC.2022.3218724
DO - 10.1109/TMC.2022.3218724
M3 - 文章
AN - SCOPUS:85141594090
SN - 1536-1233
VL - 23
SP - 535
EP - 548
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
IS - 1
ER -