TY - GEN
T1 - PrecisionProbe
T2 - 15th IEEE International Conference on Joint Cloud Computing, JCC 2024
AU - Peng, Weiyu
AU - Wang, Jinghao
AU - Wo, Tianyu
AU - Yang, Renyu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep learning recommendation models (DLRM) exploit user behaviors such as clicks, browse footprints, preferences, etc. for improved personalized experiences. However, in the face of the exponential growth of user data, such models require increasing GPU resources that are unaffordable and insufficient in a computing cluster. To improve GPU utilization and facilitate the advances of GPU scheduling algorithms, we present PrecisionProbe, a non-intrusive monitoring and analysis tool that can run upon Kubernetes and conduct sophisticated analytics of GPU resource utilization without altering the existing training code. PrecisionProbe captures fine-grained GPU metrics at the level of individual model layers and allows for a precise understanding of resource consumption patterns by exploring such detailed metrics. The mechanism is crucial for devising effective GPU scheduling algorithms, particularly tailored for DLRM training jobs dependent upon consumption patterns. Experimental results show that the recommendation models, as opposed to CV and NLP models, utilize less FP32 processing but have higher memory interaction frequencies. These findings indicate the unique resource needs of recommendation systems and necessitate the need of performance analytic using PrecisionProbe.
AB - Deep learning recommendation models (DLRM) exploit user behaviors such as clicks, browse footprints, preferences, etc. for improved personalized experiences. However, in the face of the exponential growth of user data, such models require increasing GPU resources that are unaffordable and insufficient in a computing cluster. To improve GPU utilization and facilitate the advances of GPU scheduling algorithms, we present PrecisionProbe, a non-intrusive monitoring and analysis tool that can run upon Kubernetes and conduct sophisticated analytics of GPU resource utilization without altering the existing training code. PrecisionProbe captures fine-grained GPU metrics at the level of individual model layers and allows for a precise understanding of resource consumption patterns by exploring such detailed metrics. The mechanism is crucial for devising effective GPU scheduling algorithms, particularly tailored for DLRM training jobs dependent upon consumption patterns. Experimental results show that the recommendation models, as opposed to CV and NLP models, utilize less FP32 processing but have higher memory interaction frequencies. These findings indicate the unique resource needs of recommendation systems and necessitate the need of performance analytic using PrecisionProbe.
KW - Cloud Computing
KW - Deep Recommendation Training
KW - Kubernetes
KW - Performance Analysis
UR - https://www.scopus.com/pages/publications/85206796390
U2 - 10.1109/JCC62314.2024.00010
DO - 10.1109/JCC62314.2024.00010
M3 - 会议稿件
AN - SCOPUS:85206796390
T3 - Proceedings - 2024 IEEE International Conference on Joint Cloud Computing, JCC 2024
SP - 17
EP - 20
BT - Proceedings - 2024 IEEE International Conference on Joint Cloud Computing, JCC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 July 2024 through 18 July 2024
ER -