XDetox: Text Detoxification with Token-Level Toxicity Explanations

Beomseok Lee, Hyunwoo Kim, Keon Kim, Yong Suk Choi


Abstract
Methods for mitigating toxic content through masking and infilling often overlook the decision-making process, leading to either insufficient or excessive modifications of toxic tokens. To address this challenge, we propose XDetox, a novel method that integrates token-level toxicity explanations with the masking and infilling detoxification process. We utilized this approach with two strategies to enhance the performance of detoxification. First, identifying toxic tokens to improve the quality of masking. Second, selecting the regenerated sentence by re-ranking the least toxic sentence among candidates. Our experimental results show state-of-the-art performance across four datasets compared to existing detoxification methods. Furthermore, human evaluations indicate that our method outperforms baselines in both fluency and toxicity reduction. These results demonstrate the effectiveness of our method in text detoxification.
Anthology ID:
2024.emnlp-main.848
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15215–15226
Language:
URL:
https://aclanthology.org/2024.emnlp-main.848/
DOI:
10.18653/v1/2024.emnlp-main.848
Bibkey:
Cite (ACL):
Beomseok Lee, Hyunwoo Kim, Keon Kim, and Yong Suk Choi. 2024. XDetox: Text Detoxification with Token-Level Toxicity Explanations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15215–15226, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
XDetox: Text Detoxification with Token-Level Toxicity Explanations (Lee et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.848.pdf