Constructing a Dependency Treebank for Second Language Learners of Korean

Hakyung Sung, Gyu-Ho Shin


Abstract
We introduce a manually annotated syntactic treebank based on Universal Dependencies, derived from the written data of second language (L2) Korean learners. In developing this new dataset, we critically evaluated previous works and revised the annotation guidelines to better reflect the linguistic properties of Korean and the characteristics of L2 learners. The L2 Korean treebank encompasses 7,530 sentences (66,982 words; 129,333 morphemes) and is publicly available at: https://github.com/NLPxL2Korean/L2KW-corpus.
Anthology ID:
2024.lrec-main.332
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3747–3758
Language:
URL:
https://aclanthology.org/2024.lrec-main.332
DOI:
Bibkey:
Cite (ACL):
Hakyung Sung and Gyu-Ho Shin. 2024. Constructing a Dependency Treebank for Second Language Learners of Korean. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3747–3758, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Constructing a Dependency Treebank for Second Language Learners of Korean (Sung & Shin, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.332.pdf