Does the Language Matter? Curriculum Learning over Neo-Latin Languages

Giulia Pucci, Leonardo Ranaldi


Abstract
Curriculum Learning (CL) is emerging as a relevant technique to reduce the cost of pre-training Large Language Models. The idea, tested for the English language, is to train LLMs by organizing training examples from the simplest to the most complex. Complexity measures may depend on the specific language. Hence, this paper aims to investigate whether CL and the complexity measure can be easily exported to other languages. For this reason, we present a set of linguistically motivated measures to determine the complexity of examples, which has been used in English: these measures are based on text length, rarity, and comprehensibility. We then test the approach to two Romance languages: Italian and French. Our results show that the technique can be easily exported to languages other than English without adaptation.
Anthology ID:
2024.lrec-main.464
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
5212–5220
Language:
URL:
https://aclanthology.org/2024.lrec-main.464
DOI:
Bibkey:
Cite (ACL):
Giulia Pucci and Leonardo Ranaldi. 2024. Does the Language Matter? Curriculum Learning over Neo-Latin Languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5212–5220, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Does the Language Matter? Curriculum Learning over Neo-Latin Languages (Pucci & Ranaldi, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.464.pdf