@inproceedings{finn-etal-2022-developing,
title = "Developing a Part-Of-Speech tagger for te reo {M}{\=a}ori",
author = "Finn, Aoife and
Jones, Peter-Lucas and
Mahelona, Keoni and
Duncan, Suzanne and
Leoni, Gianna",
editor = "Moeller, Sarah and
Anastasopoulos, Antonios and
Arppe, Antti and
Chaudhary, Aditi and
Harrigan, Atticus and
Holden, Josh and
Lachler, Jordan and
Palmer, Alexis and
Rijhwani, Shruti and
Schwartz, Lane",
booktitle = "Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.computel-1.12",
doi = "10.18653/v1/2022.computel-1.12",
pages = "93--98",
abstract = "This paper discusses the development of a Part-of-Speech tagger for te reo M{\=a}ori which is the Indigenous language of Aotearoa, also known as New Zealand, see Morrison. Henceforth, Part-of-Speech will be referred to as POS throughout this paper and te reo M{\=a}ori will be referred to as M{\=a}ori, while Universal Dependencies will be referred to as UD. Prior to the development of this tagger, there was no POS tagger for M{\=a}ori from Aotearoa. POS taggers tag words according to their syntactic or grammatical category. However, many traditional syntactic categories, and by consequence POS labels, do not {``}work for{''} M{\=a}ori. By this we mean that, for some of the traditional categories, The definition of, or guidelines for, an existing category is not suitable for M{\=a}ori. They do not have an existing category for certain word classes of M{\=a}ori. They do not reflect a M{\=a}ori worldview of the M{\=a}ori language. We wanted a tagset that is usable with industry-wide tools, but we also needed a tagset that would meet the needs of M{\=a}ori. Therefore, we based our tagset and guidelines on the UD tagset and tagging conventions, however the categorization of words has been significantly altered to be appropriate for M{\=a}ori. This is because at the time of development of our POS tagger, the UD conventions had still not been used to tag a Polyneisan language such as M{\=a}ori, nor did it provide any guidelines about how to tag them. To that end, we worked with highly-proficient, specially-selected M{\=a}ori speakers and linguists who are specialists in M{\=a}ori. This has ensured that our POS labels and guidelines conventions faithfully reflect a M{\=a}ori speaker{'}s conceptualization of their language.",
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="finn-etal-2022-developing">
<titleInfo>
<title>Developing a Part-Of-Speech tagger for te reo Māori</title>
</titleInfo>
<name type="personal">
<namePart type="given">Aoife</namePart>
<namePart type="family">Finn</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Peter-Lucas</namePart>
<namePart type="family">Jones</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Keoni</namePart>
<namePart type="family">Mahelona</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Suzanne</namePart>
<namePart type="family">Duncan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gianna</namePart>
<namePart type="family">Leoni</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2022-05</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages</title>
</titleInfo>
<name type="personal">
<namePart type="given">Sarah</namePart>
<namePart type="family">Moeller</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Antonios</namePart>
<namePart type="family">Anastasopoulos</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Antti</namePart>
<namePart type="family">Arppe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Aditi</namePart>
<namePart type="family">Chaudhary</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Atticus</namePart>
<namePart type="family">Harrigan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Josh</namePart>
<namePart type="family">Holden</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jordan</namePart>
<namePart type="family">Lachler</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Alexis</namePart>
<namePart type="family">Palmer</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shruti</namePart>
<namePart type="family">Rijhwani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Lane</namePart>
<namePart type="family">Schwartz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Dublin, Ireland</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>This paper discusses the development of a Part-of-Speech tagger for te reo Māori which is the Indigenous language of Aotearoa, also known as New Zealand, see Morrison. Henceforth, Part-of-Speech will be referred to as POS throughout this paper and te reo Māori will be referred to as Māori, while Universal Dependencies will be referred to as UD. Prior to the development of this tagger, there was no POS tagger for Māori from Aotearoa. POS taggers tag words according to their syntactic or grammatical category. However, many traditional syntactic categories, and by consequence POS labels, do not “work for” Māori. By this we mean that, for some of the traditional categories, The definition of, or guidelines for, an existing category is not suitable for Māori. They do not have an existing category for certain word classes of Māori. They do not reflect a Māori worldview of the Māori language. We wanted a tagset that is usable with industry-wide tools, but we also needed a tagset that would meet the needs of Māori. Therefore, we based our tagset and guidelines on the UD tagset and tagging conventions, however the categorization of words has been significantly altered to be appropriate for Māori. This is because at the time of development of our POS tagger, the UD conventions had still not been used to tag a Polyneisan language such as Māori, nor did it provide any guidelines about how to tag them. To that end, we worked with highly-proficient, specially-selected Māori speakers and linguists who are specialists in Māori. This has ensured that our POS labels and guidelines conventions faithfully reflect a Māori speaker’s conceptualization of their language.</abstract>
<identifier type="citekey">finn-etal-2022-developing</identifier>
<identifier type="doi">10.18653/v1/2022.computel-1.12</identifier>
<location>
<url>https://aclanthology.org/2022.computel-1.12</url>
</location>
<part>
<date>2022-05</date>
<extent unit="page">
<start>93</start>
<end>98</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Developing a Part-Of-Speech tagger for te reo Māori
%A Finn, Aoife
%A Jones, Peter-Lucas
%A Mahelona, Keoni
%A Duncan, Suzanne
%A Leoni, Gianna
%Y Moeller, Sarah
%Y Anastasopoulos, Antonios
%Y Arppe, Antti
%Y Chaudhary, Aditi
%Y Harrigan, Atticus
%Y Holden, Josh
%Y Lachler, Jordan
%Y Palmer, Alexis
%Y Rijhwani, Shruti
%Y Schwartz, Lane
%S Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland
%F finn-etal-2022-developing
%X This paper discusses the development of a Part-of-Speech tagger for te reo Māori which is the Indigenous language of Aotearoa, also known as New Zealand, see Morrison. Henceforth, Part-of-Speech will be referred to as POS throughout this paper and te reo Māori will be referred to as Māori, while Universal Dependencies will be referred to as UD. Prior to the development of this tagger, there was no POS tagger for Māori from Aotearoa. POS taggers tag words according to their syntactic or grammatical category. However, many traditional syntactic categories, and by consequence POS labels, do not “work for” Māori. By this we mean that, for some of the traditional categories, The definition of, or guidelines for, an existing category is not suitable for Māori. They do not have an existing category for certain word classes of Māori. They do not reflect a Māori worldview of the Māori language. We wanted a tagset that is usable with industry-wide tools, but we also needed a tagset that would meet the needs of Māori. Therefore, we based our tagset and guidelines on the UD tagset and tagging conventions, however the categorization of words has been significantly altered to be appropriate for Māori. This is because at the time of development of our POS tagger, the UD conventions had still not been used to tag a Polyneisan language such as Māori, nor did it provide any guidelines about how to tag them. To that end, we worked with highly-proficient, specially-selected Māori speakers and linguists who are specialists in Māori. This has ensured that our POS labels and guidelines conventions faithfully reflect a Māori speaker’s conceptualization of their language.
%R 10.18653/v1/2022.computel-1.12
%U https://aclanthology.org/2022.computel-1.12
%U https://doi.org/10.18653/v1/2022.computel-1.12
%P 93-98
Markdown (Informal)
[Developing a Part-Of-Speech tagger for te reo Māori](https://aclanthology.org/2022.computel-1.12) (Finn et al., ComputEL 2022)
ACL
- Aoife Finn, Peter-Lucas Jones, Keoni Mahelona, Suzanne Duncan, and Gianna Leoni. 2022. Developing a Part-Of-Speech tagger for te reo Māori. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 93–98, Dublin, Ireland. Association for Computational Linguistics.