Skip to main navigation Skip to search Skip to main content

Restoring reproducibility of jupyter notebooks

  • Jiawei Wang
  • , Tzu Yang Kuo
  • , Li Li
  • , Andreas Zeller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Jupyter notebooks-documents that contain live code, equations,visualizations, and narrative text-now are among the most popular means to compute, present, discuss and disseminate scientificfindings. In principle, Jupyter notebooks should easily allow to reproduce and extend scientific computations and their findings; butin practice, this is not the case. The individual code cells in Jupyternotebooks can be executed in any order, with identifier usages preceding their definitions and results preceding their computations.In a sample of 936 published notebooks that would be executablein principle, we found that 73% of them would not be reproduciblewith straightforward approaches, requiring humans to infer (andoften guess) the order in which the authors created the cells.In this paper, we present an approach to (1) automatically satisfydependencies between code cells to reconstruct possible executionorders of the cells; and (2) instrument code cells to mitigate theimpact of non-reproducible statements (i.e., random functions) inJupyter notebooks. Our Osiris prototype takes a notebook as inputand outputs the possible execution schemes that reproduce theexact notebook results. In our sample, Osiris was able to reconstructsuch schemes for 82.23% of all executable notebooks, which hasmore than three times better than the state-of-the-art; the resultingreordered code is valid program code and thus available for furthertesting and analysis.

Original languageEnglish
Title of host publicationProceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering
Subtitle of host publicationCompanion Proceedings, ICSE-Companion 2020
PublisherIEEE Computer Society
Pages288-289
Number of pages2
ISBN (Electronic)9781450371223
DOIs
StatePublished - 27 Jun 2020
Externally publishedYes
Event42nd ACM/IEEE International Conference on Software Engineering, ICSE-Companion 2020 - Virtual, Online, Korea, Republic of
Duration: 27 Jun 202019 Jul 2020

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference42nd ACM/IEEE International Conference on Software Engineering, ICSE-Companion 2020
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period27/06/2019/07/20

Keywords

  • Jupyter Notebooks
  • Osiris
  • Python
  • Reproducibility

Fingerprint

Dive into the research topics of 'Restoring reproducibility of jupyter notebooks'. Together they form a unique fingerprint.

Cite this