A Low-Resource Approach to the Grammatical Error Correction of Ukrainian

Frank Palma Gomez, Alla Rozovskaya, Dan Roth


Abstract
We present our system that participated in the shared task on the grammatical error correction of Ukrainian. We have implemented two approaches that make use of large pre-trained language models and synthetic data, that have been used for error correction of English as well as low-resource languages. The first approach is based on fine-tuning a large multilingual language model (mT5) in two stages: first, on synthetic data, and then on gold data. The second approach trains a (smaller) seq2seq Transformer model pre-trained on synthetic data and fine-tuned on gold data. Our mT5-based model scored first in “GEC only” track, and a very close second in the “GEC+Fluency” track. Our two key innovations are (1) finetuning in stages, first on synthetic, and then on gold data; and (2) a high-quality corruption method based on roundtrip machine translation to complement existing noisification approaches.
Anthology ID:
2023.unlp-1.14
Volume:
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–120
Language:
URL:
https://aclanthology.org/2023.unlp-1.14
DOI:
10.18653/v1/2023.unlp-1.14
Bibkey:
Cite (ACL):
Frank Palma Gomez, Alla Rozovskaya, and Dan Roth. 2023. A Low-Resource Approach to the Grammatical Error Correction of Ukrainian. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 114–120, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
A Low-Resource Approach to the Grammatical Error Correction of Ukrainian (Palma Gomez et al., UNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.unlp-1.14.pdf
Video:
 https://aclanthology.org/2023.unlp-1.14.mp4