TY - GEN
T1 - Fuzz testing based data augmentation to improve robustness of deep neural networks
AU - Gao, Xiang
AU - Saha, Ripon K.
AU - Prasad, Mukul R.
AU - Roychoudhury, Abhik
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/6/27
Y1 - 2020/6/27
N2 - Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.
AB - Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.
KW - Data augmentation
KW - Dnn
KW - Genetic algorithm
KW - Robustness
UR - https://www.scopus.com/pages/publications/85094315315
U2 - 10.1145/3377811.3380415
DO - 10.1145/3377811.3380415
M3 - 会议稿件
AN - SCOPUS:85094315315
T3 - Proceedings - International Conference on Software Engineering
SP - 1147
EP - 1158
BT - Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020
PB - IEEE Computer Society
T2 - 42nd ACM/IEEE International Conference on Software Engineering, ICSE 2020
Y2 - 27 June 2020 through 19 July 2020
ER -