TY - GEN
T1 - Does the failing test execute a single or multiple faults? An approach to classifying failing tests
AU - Yu, Zhongxing
AU - Bai, Chenggang
AU - Cai, Kai Yuan
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/12
Y1 - 2015/8/12
N2 - Debugging is an indispensable yet frustrating activity in software development and maintenance. Thus, numerous techniques have been proposed to aid this task. Despite the demonstrated effectiveness and future potential of these techniques, many of them have the unrealistic single-fault failure assumption. To alleviate this problem, we propose a technique that can be used to distinguish failing tests that executed a single fault from those that executed multiple faults in this paper. The technique suitably combines information from (i) a set of fault localization ranked lists, each produced for a certain failing test and (ii) the distance between a failing test and the passing test that most resembles it to achieve this goal. An experiment on 5 real-life medium-sized programs with 18, 920 multiple-fault versions, which are shipped with number of faults ranging from 2 to 8, has been conducted to evaluate the technique. The results indicate that the performance of the technique in terms of evaluation measures precision, recall, and F-measure is promising. In addition, for the identified failing tests that executed a single fault, the technique can also properly cluster them.
AB - Debugging is an indispensable yet frustrating activity in software development and maintenance. Thus, numerous techniques have been proposed to aid this task. Despite the demonstrated effectiveness and future potential of these techniques, many of them have the unrealistic single-fault failure assumption. To alleviate this problem, we propose a technique that can be used to distinguish failing tests that executed a single fault from those that executed multiple faults in this paper. The technique suitably combines information from (i) a set of fault localization ranked lists, each produced for a certain failing test and (ii) the distance between a failing test and the passing test that most resembles it to achieve this goal. An experiment on 5 real-life medium-sized programs with 18, 920 multiple-fault versions, which are shipped with number of faults ranging from 2 to 8, has been conducted to evaluate the technique. The results indicate that the performance of the technique in terms of evaluation measures precision, recall, and F-measure is promising. In addition, for the identified failing tests that executed a single fault, the technique can also properly cluster them.
KW - Binary classification
KW - Debugging
KW - Distance calculation
KW - Fault localization
UR - https://www.scopus.com/pages/publications/84951844730
U2 - 10.1109/ICSE.2015.102
DO - 10.1109/ICSE.2015.102
M3 - 会议稿件
AN - SCOPUS:84951844730
T3 - Proceedings - International Conference on Software Engineering
SP - 924
EP - 935
BT - Proceedings - 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, ICSE 2015
PB - IEEE Computer Society
T2 - 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015
Y2 - 16 May 2015 through 24 May 2015
ER -