TY - JOUR
T1 - Imperceptible Adversarial Attack with Multigranular Spatiotemporal Attention for Video Action Recognition
AU - Wu, Guoming
AU - Xu, Yangfan
AU - Li, Jun
AU - Shi, Zhiping
AU - Liu, Xianglong
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2023/10/15
Y1 - 2023/10/15
N2 - In recent years, the application of video Internet of Things (IoT) in various cities and public places has brought unprecedented opportunities to the security field and achieved great success. However, the latest research shows that video recognition models are also vulnerable to adversarial examples, but adversarial examples based on physical attacks are easily detected by humans, making it difficult to pass human review. To address this problem, in this article, we propose to introduce a novel multigranular spatiotemporal attention network (MSANet), which can attack the video action recognition models imperceptibly. Specifically, to exploit video motion information more effectively and to reduce the detectability of attack perturbations, we design a multiplexed spatiotemporal attention module to select and enhance spatial regions and temporal frames at coarse-grained and fine-grained levels, respectively, thus maintaining a certain degree of smoothness while reducing the perturbation size and avoiding attacking overfitting. In addition, our proposed MSANet achieves imperceptible perturbations to video sequences through alternate iterative optimization combined with the PGD attack mechanism. extended experimental results on two different models (e.g., TDN and TSM) and two widely used data sets [HMDB-51 (Kuehne et al., 2011) and UCF-101 (Soomro et al., 2012)], compared to the state-of-the-art model, demonstrate the effectiveness of our devised video action recognition attack approach.
AB - In recent years, the application of video Internet of Things (IoT) in various cities and public places has brought unprecedented opportunities to the security field and achieved great success. However, the latest research shows that video recognition models are also vulnerable to adversarial examples, but adversarial examples based on physical attacks are easily detected by humans, making it difficult to pass human review. To address this problem, in this article, we propose to introduce a novel multigranular spatiotemporal attention network (MSANet), which can attack the video action recognition models imperceptibly. Specifically, to exploit video motion information more effectively and to reduce the detectability of attack perturbations, we design a multiplexed spatiotemporal attention module to select and enhance spatial regions and temporal frames at coarse-grained and fine-grained levels, respectively, thus maintaining a certain degree of smoothness while reducing the perturbation size and avoiding attacking overfitting. In addition, our proposed MSANet achieves imperceptible perturbations to video sequences through alternate iterative optimization combined with the PGD attack mechanism. extended experimental results on two different models (e.g., TDN and TSM) and two widely used data sets [HMDB-51 (Kuehne et al., 2011) and UCF-101 (Soomro et al., 2012)], compared to the state-of-the-art model, demonstrate the effectiveness of our devised video action recognition attack approach.
KW - Imperceptible adversarial attack
KW - spatial attention
KW - spatiotemporal attention
KW - temporal attention
KW - video action recognition
UR - https://www.scopus.com/pages/publications/85161045710
U2 - 10.1109/JIOT.2023.3280737
DO - 10.1109/JIOT.2023.3280737
M3 - 文章
AN - SCOPUS:85161045710
SN - 2327-4662
VL - 10
SP - 17785
EP - 17796
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 20
ER -