CERT-ED: Certifiably Robust Text Classification for Edit Distance

Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein


Abstract
With the growing integration of AI in daily life, ensuring the robustness of systems to inference-time attacks is crucial. Among the approaches for certifying robustness to such adversarial examples, randomized smoothing has emerged as highly promising due to its nature as a wrapper around arbitrary black-box models. Previous work on randomized smoothing in natural language processing has primarily focused on specific subsets of edit distance operations, such as synonym substitution or word insertion, without exploring the certification of all edit operations. In this paper, we adapt Randomized Deletion (Huang et al., 2023) and propose, CERTified Edit Distance defense (CERT-ED) for natural language classification. Through comprehensive experiments, we demonstrate that CERT-ED outperforms the existing Hamming distance method RanMASK (Zeng et al., 2023) in 4 out of 5 datasets in terms of both accuracy and the cardinality of the certificate. By covering various threat models, including 5 direct and 5 transfer attacks, our method improves empirical robustness in 38 out of 50 settings.
Anthology ID:
2024.findings-emnlp.635
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10813–10835
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.635/
DOI:
10.18653/v1/2024.findings-emnlp.635
Bibkey:
Cite (ACL):
Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, and Benjamin I. P. Rubinstein. 2024. CERT-ED: Certifiably Robust Text Classification for Edit Distance. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10813–10835, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CERT-ED: Certifiably Robust Text Classification for Edit Distance (Huang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.635.pdf
Software:
 2024.findings-emnlp.635.software.zip