RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks

Zhaoyang Wang, Zhiyue Liu, Xiaopeng Zheng, Qinliang Su, Jiahai Wang


Abstract
Adversarial attacks on deep neural networks keep raising security concerns in natural language processing research. Existing defenses focus on improving the robustness of the victim model in the training stage. However, they often neglect to proactively mitigate adversarial attacks during inference. Towards this overlooked aspect, we propose a defense framework that aims to mitigate attacks by confusing attackers and correcting adversarial contexts that are caused by malicious perturbations. Our framework comprises three components: (1) a synonym-based transformation to randomly corrupt adversarial contexts in the word level, (2) a developed BERT defender to correct abnormal contexts in the representation level, and (3) a simple detection method to filter out adversarial examples, any of which can be flexibly combined. Additionally, our framework helps improve the robustness of the victim model during training. Extensive experiments demonstrate the effectiveness of our framework in defending against word-level adversarial attacks.
Anthology ID:
2023.acl-long.155
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2757–2774
Language:
URL:
https://aclanthology.org/2023.acl-long.155
DOI:
10.18653/v1/2023.acl-long.155
Bibkey:
Cite (ACL):
Zhaoyang Wang, Zhiyue Liu, Xiaopeng Zheng, Qinliang Su, and Jiahai Wang. 2023. RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2757–2774, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks (Wang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.155.pdf
Video:
 https://aclanthology.org/2023.acl-long.155.mp4