Automatic Speech Recognition and Query By Example for Creole Languages Documentation

Cécile Macaire, Didier Schwab, Benjamin Lecouteux, Emmanuel Schang


Abstract
We investigate the exploitation of self-supervised models for two Creole languages with few resources: Gwadloupéyen and Morisien. Automatic language processing tools are almost non-existent for these two languages. We propose to use about one hour of annotated data to design an automatic speech recognition system for each language. We evaluate how much data is needed to obtain a query-by-example system that is usable by linguists. Moreover, our experiments show that multilingual self-supervised models are not necessarily the most efficient for Creole languages.
Anthology ID:
2022.findings-acl.197
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2512–2520
Language:
URL:
https://aclanthology.org/2022.findings-acl.197
DOI:
10.18653/v1/2022.findings-acl.197
Bibkey:
Cite (ACL):
Cécile Macaire, Didier Schwab, Benjamin Lecouteux, and Emmanuel Schang. 2022. Automatic Speech Recognition and Query By Example for Creole Languages Documentation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2512–2520, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Automatic Speech Recognition and Query By Example for Creole Languages Documentation (Macaire et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.197.pdf
Video:
 https://aclanthology.org/2022.findings-acl.197.mp4
Code
 macairececile/asr-qbe-creole