Skip to main navigation Skip to search Skip to main content

Nondeterministic Impact of CPU Multithreading on Training Deep Learning Systems

  • Guanping Xiao*
  • , Jun Liu
  • , Zheng Zheng
  • , Yulei Sui
  • *Corresponding author for this work
  • Nanjing University of Aeronautics and Astronautics
  • Nanjing University
  • University of Technology Sydney

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the wide deployment of deep learning (DL) systems, research in reliable and robust DL is not an option but a priority, especially for safety-critical applications. Unfortunately, DL systems are usually nondeterministic. Due to software-level (e.g., randomness) and hardware-level (e.g., GPUs or CPUs) factors, multiple training runs can generate inconsistent models and yield different evaluation results, even with identical settings and training data on the same implementation framework and hardware platform. Existing studies focus on analyzing software-level nondeterminism factors and the nondeterminism introduced by GPUs. However, the nondeterminism impact of CPU multi-threading on training DL systems has rarely been studied. To fill this knowledge gap, we present the first work of studying the variance and robustness of DL systems impacted by CPU multithreading. Our major contributions are fourfold: 1) An experimental framework based on VirtualBox for analyzing the impact of CPU multithreading on training DL systems; 2) Six findings obtained from our experiments and examination on GitHub DL projects; 3) Five implications to DL researchers and practitioners according to our findings; 4) Released the research data (https://github.com/DeterministicDeepLearning).

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 32nd International Symposium on Software Reliability Engineering, ISSRE 2021
EditorsZhi Jin, Xuandong Li, Jianwen Xiang, Leonardo Mariani, Ting Liu, Xiao Yu, Nahgmeh Ivaki
PublisherIEEE Computer Society
Pages557-568
Number of pages12
ISBN (Electronic)9781665425872
DOIs
StatePublished - 2021
Event32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021 - Wuhan, China
Duration: 25 Oct 202128 Oct 2021

Publication series

NameProceedings - International Symposium on Software Reliability Engineering, ISSRE
Volume2021-October
ISSN (Print)1071-9458

Conference

Conference32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021
Country/TerritoryChina
CityWuhan
Period25/10/2128/10/21

Keywords

  • CPU multithreading
  • Deep learning systems
  • Empirical study
  • Nondeterminism factors
  • Training variance

Fingerprint

Dive into the research topics of 'Nondeterministic Impact of CPU Multithreading on Training Deep Learning Systems'. Together they form a unique fingerprint.

Cite this