TY - GEN
T1 - A Method for Generating Adversarial Examples Based on Interpretable Information
AU - Gao, Yuntian
AU - Li, Jinlun
AU - Liu, Chang
AU - Yang, Ce
AU - Run, Yuxuan
AU - Zhao, Changdi
AU - Sun, Yangyang
AU - Li, Xiaobin
AU - Yang, Dezhen
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, machine learning (ML) technology, particularly deep neural networks (DNNs), has experienced rapid development and widespread application across various fields due to their superior performance. However, systems based on deep learning are vulnerable to adversarial attacks, where images with added adversarial perturbations can cause deep learning models to produce incorrect output predictions. This undermines the stability of neural network systems and achieves the goal of illegal attacks. Adversarial examples are a crucial means of evaluating the robustness of deep neural networks and revealing their potential security vulnerabilities. This paper addresses the issue of poor interpretability in adversarial example generation methods by proposing a method for adversarial attacks based on interpretable information. The method generates feature heatmaps by extracting interpretable information from training samples, visually representing the importance of different regions of the target. It constructs a heatmap-guided mechanism to generate adversarial patches, which are then directed to attack critical positions on the target to enhance attack precision, resulting in the final adversarial examples. Experimental results demonstrate that the proposed method generates adversarial examples with better attack performance compared to mainstream methods, outperforming existing methods in terms of both attack effectiveness and robustness.
AB - In recent years, machine learning (ML) technology, particularly deep neural networks (DNNs), has experienced rapid development and widespread application across various fields due to their superior performance. However, systems based on deep learning are vulnerable to adversarial attacks, where images with added adversarial perturbations can cause deep learning models to produce incorrect output predictions. This undermines the stability of neural network systems and achieves the goal of illegal attacks. Adversarial examples are a crucial means of evaluating the robustness of deep neural networks and revealing their potential security vulnerabilities. This paper addresses the issue of poor interpretability in adversarial example generation methods by proposing a method for adversarial attacks based on interpretable information. The method generates feature heatmaps by extracting interpretable information from training samples, visually representing the importance of different regions of the target. It constructs a heatmap-guided mechanism to generate adversarial patches, which are then directed to attack critical positions on the target to enhance attack precision, resulting in the final adversarial examples. Experimental results demonstrate that the proposed method generates adversarial examples with better attack performance compared to mainstream methods, outperforming existing methods in terms of both attack effectiveness and robustness.
KW - Adversarial Attack
KW - Adversarial Examples
KW - Deep Learning
KW - Explainability
UR - https://www.scopus.com/pages/publications/105030334680
U2 - 10.1109/ICRMS63553.2024.00171
DO - 10.1109/ICRMS63553.2024.00171
M3 - 会议稿件
AN - SCOPUS:105030334680
T3 - Proceedings - 2024 15th International Conference on Reliability, Maintenance and Safety, ICRMS 2024
SP - 1074
EP - 1080
BT - Proceedings - 2024 15th International Conference on Reliability, Maintenance and Safety, ICRMS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Conference on Reliability, Maintenance and Safety, ICRMS 2024
Y2 - 31 July 2024 through 2 August 2024
ER -