RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining

Hui Su, Weiwei Shi, Xiaoyu Shen, Zhou Xiao, Tuo Ji, Jiarui Fang, Jie Zhou


Abstract
Large-scale pretrained language models have achieved SOTA results on NLP tasks. However, they have been shown vulnerable to adversarial attacks especially for logographic languages like Chinese. In this work, we propose RoCBert: a pretrained Chinese Bert that is robust to various forms of adversarial attacks like word perturbation, synonyms, typos, etc. It is pretrained with the contrastive learning objective which maximizes the label consistency under different synthesized adversarial examples. The model takes as input multimodal information including the semantic, phonetic and visual features. We show all these features areimportant to the model robustness since the attack can be performed in all the three forms. Across 5 Chinese NLU tasks, RoCBert outperforms strong baselines under three blackbox adversarial algorithms without sacrificing the performance on clean testset. It also performs the best in the toxic content detection task under human-made attacks.
Anthology ID:
2022.acl-long.65
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
921–931
Language:
URL:
https://aclanthology.org/2022.acl-long.65
DOI:
10.18653/v1/2022.acl-long.65
Bibkey:
Cite (ACL):
Hui Su, Weiwei Shi, Xiaoyu Shen, Zhou Xiao, Tuo Ji, Jiarui Fang, and Jie Zhou. 2022. RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 921–931, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining (Su et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.65.pdf
Software:
 2022.acl-long.65.software.zip
Data
CMNLI