TY - GEN
T1 - Nondeterministic Impact of CPU Multithreading on Training Deep Learning Systems
AU - Xiao, Guanping
AU - Liu, Jun
AU - Zheng, Zheng
AU - Sui, Yulei
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - With the wide deployment of deep learning (DL) systems, research in reliable and robust DL is not an option but a priority, especially for safety-critical applications. Unfortunately, DL systems are usually nondeterministic. Due to software-level (e.g., randomness) and hardware-level (e.g., GPUs or CPUs) factors, multiple training runs can generate inconsistent models and yield different evaluation results, even with identical settings and training data on the same implementation framework and hardware platform. Existing studies focus on analyzing software-level nondeterminism factors and the nondeterminism introduced by GPUs. However, the nondeterminism impact of CPU multi-threading on training DL systems has rarely been studied. To fill this knowledge gap, we present the first work of studying the variance and robustness of DL systems impacted by CPU multithreading. Our major contributions are fourfold: 1) An experimental framework based on VirtualBox for analyzing the impact of CPU multithreading on training DL systems; 2) Six findings obtained from our experiments and examination on GitHub DL projects; 3) Five implications to DL researchers and practitioners according to our findings; 4) Released the research data (https://github.com/DeterministicDeepLearning).
AB - With the wide deployment of deep learning (DL) systems, research in reliable and robust DL is not an option but a priority, especially for safety-critical applications. Unfortunately, DL systems are usually nondeterministic. Due to software-level (e.g., randomness) and hardware-level (e.g., GPUs or CPUs) factors, multiple training runs can generate inconsistent models and yield different evaluation results, even with identical settings and training data on the same implementation framework and hardware platform. Existing studies focus on analyzing software-level nondeterminism factors and the nondeterminism introduced by GPUs. However, the nondeterminism impact of CPU multi-threading on training DL systems has rarely been studied. To fill this knowledge gap, we present the first work of studying the variance and robustness of DL systems impacted by CPU multithreading. Our major contributions are fourfold: 1) An experimental framework based on VirtualBox for analyzing the impact of CPU multithreading on training DL systems; 2) Six findings obtained from our experiments and examination on GitHub DL projects; 3) Five implications to DL researchers and practitioners according to our findings; 4) Released the research data (https://github.com/DeterministicDeepLearning).
KW - CPU multithreading
KW - Deep learning systems
KW - Empirical study
KW - Nondeterminism factors
KW - Training variance
UR - https://www.scopus.com/pages/publications/85126394811
U2 - 10.1109/ISSRE52982.2021.00063
DO - 10.1109/ISSRE52982.2021.00063
M3 - 会议稿件
AN - SCOPUS:85126394811
T3 - Proceedings - International Symposium on Software Reliability Engineering, ISSRE
SP - 557
EP - 568
BT - Proceedings - 2021 IEEE 32nd International Symposium on Software Reliability Engineering, ISSRE 2021
A2 - Jin, Zhi
A2 - Li, Xuandong
A2 - Xiang, Jianwen
A2 - Mariani, Leonardo
A2 - Liu, Ting
A2 - Yu, Xiao
A2 - Ivaki, Nahgmeh
PB - IEEE Computer Society
T2 - 32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021
Y2 - 25 October 2021 through 28 October 2021
ER -