Two Issues with Chinese Spelling Correction and A Refinement Solution

Changxuan Sun, Linlin She, Xuesong Lu


Abstract
The Chinese Spelling Correction (CSC) task aims to detect and correct misspelled characters in Chinese text, and has received lots of attention in the past few years. Most recent studies adopt a Transformer-based model and leverage different features of characters such as pronunciation, glyph and contextual information to enhance the model’s ability to complete the task. Despite their state-of-the-art performance, we observe two issues that should be addressed to further advance the CSC task. First, the widely-used benchmark datasets SIGHAN13, SIGHAN14 and SIGHAN15, contain many mistakes. Hence the performance of existing models is not accurate and should be re-evaluated. Second, existing models seem to have reached a performance bottleneck, where the improvements on the SIGHAN’s testing sets are increasingly smaller and unstable. To deal with the two issues, we make two contributions: (1) we manually fix the SIGHAN datasets and re-evaluate four representative CSC models using the fixed datasets; (2) we analyze the new results to identify the spelling errors that none of the four models successfully corrects, based on which we propose a simple yet effective refinement solution. Experimental results show that our solution improves the four models in all metrics by notable margins.
Anthology ID:
2024.acl-short.19
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
196–204
Language:
URL:
https://aclanthology.org/2024.acl-short.19
DOI:
10.18653/v1/2024.acl-short.19
Bibkey:
Cite (ACL):
Changxuan Sun, Linlin She, and Xuesong Lu. 2024. Two Issues with Chinese Spelling Correction and A Refinement Solution. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 196–204, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Two Issues with Chinese Spelling Correction and A Refinement Solution (Sun et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.19.pdf