Quality Fit for Purpose: Building Business Critical Errors Test Suites
Mariana Cabeça, Marianna Buchicchio, Madalena Gonçalves, Christine Maroti, João Godinho, Pedro Coelho, Helena Moniz, Alon Lavie
Correct Metadata for
Abstract
This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth.- Anthology ID:
- 2023.eamt-1.44
- Volume:
- Proceedings of the 24th Annual Conference of the European Association for Machine Translation
- Month:
- June
- Year:
- 2023
- Address:
- Tampere, Finland
- Editors:
- Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 451–460
- Language:
- URL:
- https://aclanthology.org/2023.eamt-1.44/
- DOI:
- Bibkey:
- Cite (ACL):
- Mariana Cabeça, Marianna Buchicchio, Madalena Gonçalves, Christine Maroti, João Godinho, Pedro Coelho, Helena Moniz, and Alon Lavie. 2023. Quality Fit for Purpose: Building Business Critical Errors Test Suites. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 451–460, Tampere, Finland. European Association for Machine Translation.
- Cite (Informal):
- Quality Fit for Purpose: Building Business Critical Errors Test Suites (Cabeça et al., EAMT 2023)
- Copy Citation:
- PDF:
- https://aclanthology.org/2023.eamt-1.44.pdf
Export citation
@inproceedings{cabeca-etal-2023-quality, title = "Quality Fit for Purpose: Building Business Critical Errors Test Suites", author = "Cabe{\c{c}}a, Mariana and Buchicchio, Marianna and Gon{\c{c}}alves, Madalena and Maroti, Christine and Godinho, Jo{\~a}o and Coelho, Pedro and Moniz, Helena and Lavie, Alon", editor = "Nurminen, Mary and Brenner, Judith and Koponen, Maarit and Latomaa, Sirkku and Mikhailov, Mikhail and Schierl, Frederike and Ranasinghe, Tharindu and Vanmassenhove, Eva and Vidal, Sergi Alvarez and Aranberri, Nora and Nunziatini, Mara and Escart{\'i}n, Carla Parra and Forcada, Mikel and Popovic, Maja and Scarton, Carolina and Moniz, Helena", booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation", month = jun, year = "2023", address = "Tampere, Finland", publisher = "European Association for Machine Translation", url = "https://aclanthology.org/2023.eamt-1.44/", pages = "451--460", abstract = "This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems' performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth." }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="cabeca-etal-2023-quality"> <titleInfo> <title>Quality Fit for Purpose: Building Business Critical Errors Test Suites</title> </titleInfo> <name type="personal"> <namePart type="given">Mariana</namePart> <namePart type="family">Cabeça</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marianna</namePart> <namePart type="family">Buchicchio</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Madalena</namePart> <namePart type="family">Gonçalves</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christine</namePart> <namePart type="family">Maroti</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">João</namePart> <namePart type="family">Godinho</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pedro</namePart> <namePart type="family">Coelho</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Helena</namePart> <namePart type="family">Moniz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alon</namePart> <namePart type="family">Lavie</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2023-06</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 24th Annual Conference of the European Association for Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Mary</namePart> <namePart type="family">Nurminen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Judith</namePart> <namePart type="family">Brenner</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Maarit</namePart> <namePart type="family">Koponen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sirkku</namePart> <namePart type="family">Latomaa</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mikhail</namePart> <namePart type="family">Mikhailov</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Frederike</namePart> <namePart type="family">Schierl</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tharindu</namePart> <namePart type="family">Ranasinghe</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Eva</namePart> <namePart type="family">Vanmassenhove</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sergi</namePart> <namePart type="given">Alvarez</namePart> <namePart type="family">Vidal</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nora</namePart> <namePart type="family">Aranberri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mara</namePart> <namePart type="family">Nunziatini</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carla</namePart> <namePart type="given">Parra</namePart> <namePart type="family">Escartín</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mikel</namePart> <namePart type="family">Forcada</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Maja</namePart> <namePart type="family">Popovic</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carolina</namePart> <namePart type="family">Scarton</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Helena</namePart> <namePart type="family">Moniz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Association for Machine Translation</publisher> <place> <placeTerm type="text">Tampere, Finland</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth.</abstract> <identifier type="citekey">cabeca-etal-2023-quality</identifier> <location> <url>https://aclanthology.org/2023.eamt-1.44/</url> </location> <part> <date>2023-06</date> <extent unit="page"> <start>451</start> <end>460</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Quality Fit for Purpose: Building Business Critical Errors Test Suites %A Cabeça, Mariana %A Buchicchio, Marianna %A Gonçalves, Madalena %A Maroti, Christine %A Godinho, João %A Coelho, Pedro %A Moniz, Helena %A Lavie, Alon %Y Nurminen, Mary %Y Brenner, Judith %Y Koponen, Maarit %Y Latomaa, Sirkku %Y Mikhailov, Mikhail %Y Schierl, Frederike %Y Ranasinghe, Tharindu %Y Vanmassenhove, Eva %Y Vidal, Sergi Alvarez %Y Aranberri, Nora %Y Nunziatini, Mara %Y Escartín, Carla Parra %Y Forcada, Mikel %Y Popovic, Maja %Y Scarton, Carolina %Y Moniz, Helena %S Proceedings of the 24th Annual Conference of the European Association for Machine Translation %D 2023 %8 June %I European Association for Machine Translation %C Tampere, Finland %F cabeca-etal-2023-quality %X This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth. %U https://aclanthology.org/2023.eamt-1.44/ %P 451-460
Markdown (Informal)
[Quality Fit for Purpose: Building Business Critical Errors Test Suites](https://aclanthology.org/2023.eamt-1.44/) (Cabeça et al., EAMT 2023)
- Quality Fit for Purpose: Building Business Critical Errors Test Suites (Cabeça et al., EAMT 2023)
ACL
- Mariana Cabeça, Marianna Buchicchio, Madalena Gonçalves, Christine Maroti, João Godinho, Pedro Coelho, Helena Moniz, and Alon Lavie. 2023. Quality Fit for Purpose: Building Business Critical Errors Test Suites. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 451–460, Tampere, Finland. European Association for Machine Translation.