TY - GEN
T1 - ELBA-Bench
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Liu, Xuxu
AU - Liang, Siyuan
AU - Han, Mengya
AU - Luo, Yong
AU - Liu, Aishan
AU - Cai, Xiantao
AU - He, Zheng
AU - Tao, Dacheng
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish ELBA-Bench, a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning (e.g., LoRA) or without fine-tuning techniques (e.g., In-context-learning). ELBA-Bench provides over 1,300 experiments encompassing the implementations of 12 attack methods, 18 datasets, and 12 LLMs. Extensive experiments provide new invaluable findings into the strengths and limitations of various attack strategies. For instance, PEFT attack consistently outperform without fine-tuning approaches in classification tasks while showing strong cross-dataset generalization with optimized triggers boosting robustness; Task-relevant backdoor optimization techniques or attack prompts along with clean and adversarial demonstrations can enhance backdoor attack success while preserving model performance on clean samples. Additionally, we introduce a universal toolbox designed for standardized backdoor attack research at https://github.com/NWPUliuxx/ELBA_Bench, with the goal of propelling further progress in this vital area.
AB - Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish ELBA-Bench, a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning (e.g., LoRA) or without fine-tuning techniques (e.g., In-context-learning). ELBA-Bench provides over 1,300 experiments encompassing the implementations of 12 attack methods, 18 datasets, and 12 LLMs. Extensive experiments provide new invaluable findings into the strengths and limitations of various attack strategies. For instance, PEFT attack consistently outperform without fine-tuning approaches in classification tasks while showing strong cross-dataset generalization with optimized triggers boosting robustness; Task-relevant backdoor optimization techniques or attack prompts along with clean and adversarial demonstrations can enhance backdoor attack success while preserving model performance on clean samples. Additionally, we introduce a universal toolbox designed for standardized backdoor attack research at https://github.com/NWPUliuxx/ELBA_Bench, with the goal of propelling further progress in this vital area.
UR - https://www.scopus.com/pages/publications/105021022545
U2 - 10.18653/v1/2025.acl-long.877
DO - 10.18653/v1/2025.acl-long.877
M3 - 会议稿件
AN - SCOPUS:105021022545
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 17928
EP - 17947
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -