Microsyntactic Unit Detection Using Word Embedding Models: Experiments on Slavic Languages

Iuliia Zaitova, Irina Stenger, Tania Avgustinova


Abstract
Microsyntactic units have been defined as language-specific transitional entities between lexicon and grammar, whose idiomatic properties are closely tied to syntax. These units are typically described based on individual constructions, making it difficult to understand them comprehensively as a class. This study proposes a novel approach to detect microsyntactic units using Word Embedding Models (WEMs) trained on six Slavic languages, namely Belarusian, Bulgarian, Czech, Polish, Russian, and Ukrainian, and evaluates how well these models capture the nuances of syntactic non-compositionality. To evaluate the models, we develop a cross-lingual inventory of microsyntactic units using the lists of microsyntantic units available at the Russian National Corpus. Our results demonstrate the effectiveness of WEMs in capturing microsyntactic units across all six Slavic languages under analysis. Additionally, we find that WEMs tailored for syntax-based tasks consistently outperform other WEMs at the task. Our findings contribute to the theory of microsyntax by providing insights into the detection of microsyntactic units and their cross-linguistic properties.
Anthology ID:
2023.ranlp-1.134
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1265–1273
Language:
URL:
https://aclanthology.org/2023.ranlp-1.134
DOI:
Bibkey:
Cite (ACL):
Iuliia Zaitova, Irina Stenger, and Tania Avgustinova. 2023. Microsyntactic Unit Detection Using Word Embedding Models: Experiments on Slavic Languages. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1265–1273, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Microsyntactic Unit Detection Using Word Embedding Models: Experiments on Slavic Languages (Zaitova et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.134.pdf