跳到主要导航 跳到搜索 跳到主要内容

A critical review on the evaluation of automated program repair systems

  • Kui Liu
  • , Li Li*
  • , Anil Koyuncu
  • , Dongsun Kim
  • , Zhe Liu
  • , Jacques Klein
  • , Tegawendé F. Bissyandé
  • *此作品的通讯作者
  • Nanjing University of Aeronautics and Astronautics
  • Monash University
  • University of Luxembourg
  • Kyungpook National University

科研成果: 期刊稿件文章同行评审

摘要

Automated Program Repair (APR) has attracted significant attention from software engineering research and practice communities in the last decade. Several teams have recorded promising performance in fixing real bugs and there is a race in the literature to fix as many bugs as possible from established benchmarks. Gradually, repair performance of APR tools in the literature has gone from being evaluated with a metric on the number of generated plausible patches to the number of correct patches. This evolution is necessary after a study highlighting the overfitting issue in test suite-based automatic patch generation. Simultaneously, some researchers are also insisting on providing time cost in the repair scenario as a metric for comparing state-of-the-art systems. In this paper, we discuss how the latest evaluation metrics of APR systems could be biased. Since design decisions (both in approach and evaluation setup) are not always fully disclosed, the impact on repair performance is unknown and computed metrics are often misleading. To reduce notable biases of design decisions in program repair approaches, we conduct a critical review on the evaluation of patch generation systems and propose eight evaluation metrics for fairly assessing the performance of APR tools. Eventually, we show with experimental data on 11 baseline program repair systems that the proposed metrics allow to highlight some caveats in the literature. We expect wide adoption of these metrics in the community to contribute to boosting the development of practical, and reliably performable program repair tools.

源语言英语
文章编号110817
期刊Journal of Systems and Software
171
DOI
出版状态已出版 - 1月 2021
已对外发布

指纹

探究 'A critical review on the evaluation of automated program repair systems' 的科研主题。它们共同构成独一无二的指纹。

引用此