Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu


Abstract
Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting. However, we find that this learning paradigm inherits the vulnerability from the pre-training stage, where model predictions can be misled by inserting certain triggers into the text. In this paper, we explore this universal vulnerability by either injecting backdoor triggers or searching for adversarial triggers on pre-trained language models using only plain text. In both scenarios, we demonstrate that our triggers can totally control or severely decrease the performance of prompt-based models fine-tuned on arbitrary downstream tasks, reflecting the universal vulnerability of the prompt-based learning paradigm. Further experiments show that adversarial triggers have good transferability among language models. We also find conventional fine-tuning models are not vulnerable to adversarial triggers constructed from pre-trained language models. We conclude by proposing a potential solution to mitigate our attack methods. Code and data are publicly available.
Anthology ID:
2022.findings-naacl.137
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1799–1810
Language:
URL:
https://aclanthology.org/2022.findings-naacl.137
DOI:
10.18653/v1/2022.findings-naacl.137
Bibkey:
Cite (ACL):
Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, and Zhiyuan Liu. 2022. Exploring the Universal Vulnerability of Prompt-based Learning Paradigm. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1799–1810, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Exploring the Universal Vulnerability of Prompt-based Learning Paradigm (Xu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.137.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.137.mp4
Code
 leix28/prompt-universal-vulnerability
Data
IMDb Movie Reviews