Turning Swords into Shields: Defense Against Adversarial Examples by Using Trojan Attacks

  • Chengbin Sun
  • , Hailong Sun*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural networks (DNNs) are inherently vulnerable to adversarial examples, which present significant challenges for their reliable deployment in safety-critical applications. Although several defense strategies have been proposed to mitigate this vulnerability, their effectiveness is frequently undermined by white-box attacks, where adversaries exploit detailed knowledge of the underlying defense mechanisms. In this paper, we propose SNARE, a novel defense mechanism designed to effectively mitigate the impact of white-box adversaries. Instead of merely masking the inherent vulnerabilities of DNNs, SNARE strategically introduces intentionally introduced vulnerabilities, thereby guiding attackers toward predictable attack patterns that can be effectively mitigated. By embedding a plug-and-play module into the target model, SNARE deliberately engineers vulnerabilities that serve as decoys, directing adversaries to produce adversarial examples with specific characteristics. These features show discernible patterns that are consistently detectable, thereby enabling accurate identification of adversarial examples. The module can be seamlessly integrated into different model architectures without altering their essential functionality. Experimental results demonstrate that SNARE surpasses existing state-of-the-art defense techniques across numerous benchmarks.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
EditorsJosef Kittler, Hongkai Xiong, Weiyao Lin, Jian Yang, Xilin Chen, Jiwen Lu, Jingyi Yu, Weishi Zheng
PublisherSpringer Science and Business Media Deutschland GmbH
Pages136-151
Number of pages16
ISBN (Print)9789819556984
DOIs
StatePublished - 2026
Event8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, China
Duration: 15 Oct 202518 Oct 2025

Publication series

NameLecture Notes in Computer Science
Volume16275 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
Country/TerritoryChina
CityShanghai
Period15/10/2518/10/25

Keywords

  • Adversarial examples
  • Deep Learning Security
  • Trojan Attack

Fingerprint

Dive into the research topics of 'Turning Swords into Shields: Defense Against Adversarial Examples by Using Trojan Attacks'. Together they form a unique fingerprint.

Cite this