@inproceedings{coto-solano-etal-2022-development,
title = "Development of Automatic Speech Recognition for the Documentation of {C}ook {I}slands {M}{\=a}ori",
author = "Coto-Solano, Rolando and
Nicholas, Sally Akevai and
Datta, Samiha and
Quint, Victoria and
Wills, Piripi and
Powell, Emma Ngakuravaru and
Koka{'}ua, Liam and
Tanveer, Syed and
Feldman, Isaac",
editor = "Calzolari, Nicoletta and
B{\'e}chet, Fr{\'e}d{\'e}ric and
Blache, Philippe and
Choukri, Khalid and
Cieri, Christopher and
Declerck, Thierry and
Goggi, Sara and
Isahara, Hitoshi and
Maegaard, Bente and
Mariani, Joseph and
Mazo, H{\'e}l{\`e}ne and
Odijk, Jan and
Piperidis, Stelios",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.412",
pages = "3872--3882",
abstract = "This paper describes the process of data processing and training of an automatic speech recognition (ASR) system for Cook Islands M{\=a}ori (CIM), an Indigenous language spoken by approximately 22,000 people in the South Pacific. We transcribed four hours of speech from adults and elderly speakers of the language and prepared two experiments. First, we trained three ASR systems: one statistical, Kaldi; and two based on Deep Learning, DeepSpeech and XLSR-Wav2Vec2. Wav2Vec2 tied with Kaldi for lowest character error rate (CER=6{\mbox{$\pm$}}1) and was slightly behind in word error rate (WER=23{\mbox{$\pm$}}2 versus WER=18{\mbox{$\pm$}}2 for Kaldi). This provides evidence that Deep Learning ASR systems are reaching the performance of statistical methods on small datasets, and that they can work effectively with extremely low-resource Indigenous languages like CIM. In the second experiment we used Wav2Vec2 to train models with held-out speakers. While the performance decreased (CER=15{\mbox{$\pm$}}7, WER=46{\mbox{$\pm$}}16), the system still showed considerable learning. We intend to use ASR to accelerate the documentation of CIM, using newly transcribed texts to improve the ASR and also generate teaching and language revitalization materials. The trained model is available under a license based on the Kaitiakitanga License, which provides for non-commercial use while retaining control of the model by the Indigenous community.",
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="coto-solano-etal-2022-development">
<titleInfo>
<title>Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori</title>
</titleInfo>
<name type="personal">
<namePart type="given">Rolando</namePart>
<namePart type="family">Coto-Solano</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sally</namePart>
<namePart type="given">Akevai</namePart>
<namePart type="family">Nicholas</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Samiha</namePart>
<namePart type="family">Datta</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Victoria</namePart>
<namePart type="family">Quint</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Piripi</namePart>
<namePart type="family">Wills</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Emma</namePart>
<namePart type="given">Ngakuravaru</namePart>
<namePart type="family">Powell</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Liam</namePart>
<namePart type="family">Koka’ua</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Syed</namePart>
<namePart type="family">Tanveer</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Isaac</namePart>
<namePart type="family">Feldman</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2022-06</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the Thirteenth Language Resources and Evaluation Conference</title>
</titleInfo>
<name type="personal">
<namePart type="given">Nicoletta</namePart>
<namePart type="family">Calzolari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Frédéric</namePart>
<namePart type="family">Béchet</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Philippe</namePart>
<namePart type="family">Blache</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Khalid</namePart>
<namePart type="family">Choukri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Christopher</namePart>
<namePart type="family">Cieri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Thierry</namePart>
<namePart type="family">Declerck</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sara</namePart>
<namePart type="family">Goggi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hitoshi</namePart>
<namePart type="family">Isahara</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bente</namePart>
<namePart type="family">Maegaard</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Joseph</namePart>
<namePart type="family">Mariani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hélène</namePart>
<namePart type="family">Mazo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jan</namePart>
<namePart type="family">Odijk</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Stelios</namePart>
<namePart type="family">Piperidis</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>European Language Resources Association</publisher>
<place>
<placeTerm type="text">Marseille, France</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>This paper describes the process of data processing and training of an automatic speech recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by approximately 22,000 people in the South Pacific. We transcribed four hours of speech from adults and elderly speakers of the language and prepared two experiments. First, we trained three ASR systems: one statistical, Kaldi; and two based on Deep Learning, DeepSpeech and XLSR-Wav2Vec2. Wav2Vec2 tied with Kaldi for lowest character error rate (CER=6\pm1) and was slightly behind in word error rate (WER=23\pm2 versus WER=18\pm2 for Kaldi). This provides evidence that Deep Learning ASR systems are reaching the performance of statistical methods on small datasets, and that they can work effectively with extremely low-resource Indigenous languages like CIM. In the second experiment we used Wav2Vec2 to train models with held-out speakers. While the performance decreased (CER=15\pm7, WER=46\pm16), the system still showed considerable learning. We intend to use ASR to accelerate the documentation of CIM, using newly transcribed texts to improve the ASR and also generate teaching and language revitalization materials. The trained model is available under a license based on the Kaitiakitanga License, which provides for non-commercial use while retaining control of the model by the Indigenous community.</abstract>
<identifier type="citekey">coto-solano-etal-2022-development</identifier>
<location>
<url>https://aclanthology.org/2022.lrec-1.412</url>
</location>
<part>
<date>2022-06</date>
<extent unit="page">
<start>3872</start>
<end>3882</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori
%A Coto-Solano, Rolando
%A Nicholas, Sally Akevai
%A Datta, Samiha
%A Quint, Victoria
%A Wills, Piripi
%A Powell, Emma Ngakuravaru
%A Koka’ua, Liam
%A Tanveer, Syed
%A Feldman, Isaac
%Y Calzolari, Nicoletta
%Y Béchet, Frédéric
%Y Blache, Philippe
%Y Choukri, Khalid
%Y Cieri, Christopher
%Y Declerck, Thierry
%Y Goggi, Sara
%Y Isahara, Hitoshi
%Y Maegaard, Bente
%Y Mariani, Joseph
%Y Mazo, Hélène
%Y Odijk, Jan
%Y Piperidis, Stelios
%S Proceedings of the Thirteenth Language Resources and Evaluation Conference
%D 2022
%8 June
%I European Language Resources Association
%C Marseille, France
%F coto-solano-etal-2022-development
%X This paper describes the process of data processing and training of an automatic speech recognition (ASR) system for Cook Islands Māori (CIM), an Indigenous language spoken by approximately 22,000 people in the South Pacific. We transcribed four hours of speech from adults and elderly speakers of the language and prepared two experiments. First, we trained three ASR systems: one statistical, Kaldi; and two based on Deep Learning, DeepSpeech and XLSR-Wav2Vec2. Wav2Vec2 tied with Kaldi for lowest character error rate (CER=6\pm1) and was slightly behind in word error rate (WER=23\pm2 versus WER=18\pm2 for Kaldi). This provides evidence that Deep Learning ASR systems are reaching the performance of statistical methods on small datasets, and that they can work effectively with extremely low-resource Indigenous languages like CIM. In the second experiment we used Wav2Vec2 to train models with held-out speakers. While the performance decreased (CER=15\pm7, WER=46\pm16), the system still showed considerable learning. We intend to use ASR to accelerate the documentation of CIM, using newly transcribed texts to improve the ASR and also generate teaching and language revitalization materials. The trained model is available under a license based on the Kaitiakitanga License, which provides for non-commercial use while retaining control of the model by the Indigenous community.
%U https://aclanthology.org/2022.lrec-1.412
%P 3872-3882
Markdown (Informal)
[Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori](https://aclanthology.org/2022.lrec-1.412) (Coto-Solano et al., LREC 2022)
ACL
- Rolando Coto-Solano, Sally Akevai Nicholas, Samiha Datta, Victoria Quint, Piripi Wills, Emma Ngakuravaru Powell, Liam Koka’ua, Syed Tanveer, and Isaac Feldman. 2022. Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3872–3882, Marseille, France. European Language Resources Association.