Combining Expert Knowledge with Frequency Information to Infer CEFR Levels for Words

Alice Pintard, Thomas François


Abstract
Traditional approaches to set goals in second language (L2) vocabulary acquisition relied either on word lists that were obtained from large L1 corpora or on collective knowledge and experience of L2 experts, teachers, and examiners. Both approaches are known to offer some advantages, but also to have some limitations. In this paper, we try to combine both sources of information, namely the official reference level description for French language and the FLElex lexical database. Our aim is to train a statistical model on the French RLD that would be able to turn the distributional information from FLElex into one of the six levels of the Common European Framework of Reference for languages (CEFR). We show that such approach yields a gain of 29% in accuracy compared to the method currently used in the CEFRLex project. Besides, our experiments also offer deeper insights into the advantages and shortcomings of the two traditional sources of information (frequency vs. expert knowledge).
Anthology ID:
2020.readi-1.13
Volume:
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Núria Gala, Rodrigo Wilkens
Venue:
READI
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
85–92
Language:
English
URL:
https://aclanthology.org/2020.readi-1.13
DOI:
Bibkey:
Cite (ACL):
Alice Pintard and Thomas François. 2020. Combining Expert Knowledge with Frequency Information to Infer CEFR Levels for Words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 85–92, Marseille, France. European Language Resources Association.
Cite (Informal):
Combining Expert Knowledge with Frequency Information to Infer CEFR Levels for Words (Pintard & François, READI 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.readi-1.13.pdf