EPOQUE: An English-Persian Quality Estimation Dataset

Mohammed Hossein Jafari Harandi, Fatemeh Azadi, Mohammad Javad Dousti, Heshaam Faili


Abstract
Translation quality estimation (QE) is an important component in real-world machine translation applications. Unfortunately, human labeled QE datasets, which play an important role in developing and assessing QE models, are only available for limited language pairs. In this paper, we present the first English-Persian QE dataset, called EPOQUE, which has manually annotated direct assessment labels. EPOQUE contains 1000 sentences translated from English to Persian and annotated by three human annotators. It is publicly available, and thus can be used as a zero-shot test set, or for other scenarios in future work. We also evaluate and report the performance of two state-of-the-art QE models, i.e., Transquest and CometKiwi, as baselines on our dataset. Furthermore, our experiments show that using a small subset of the proposed dataset containing 300 sentences to fine-tune Transquest, can improve its performance by more that 8% in terms of the Pearson correlation with a held-out test set.
Anthology ID:
2024.lrec-main.550
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6228–6235
Language:
URL:
https://aclanthology.org/2024.lrec-main.550
DOI:
Bibkey:
Cite (ACL):
Mohammed Hossein Jafari Harandi, Fatemeh Azadi, Mohammad Javad Dousti, and Heshaam Faili. 2024. EPOQUE: An English-Persian Quality Estimation Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6228–6235, Torino, Italia. ELRA and ICCL.
Cite (Informal):
EPOQUE: An English-Persian Quality Estimation Dataset (Jafari Harandi et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.550.pdf