2022
pdf
bib
abs
Transfer Learning Methods for Domain Adaptation in Technical Logbook Datasets
Farhad Akhbardeh
|
Marcos Zampieri
|
Cecilia Ovesdotter Alm
|
Travis Desell
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Event identification in technical logbooks poses challenges given the limited logbook data available in specific technical domains, the large set of possible classes, and logbook entries typically being in short form and non-standard technical language. Technical logbook data typically has both a domain, the field it comes from (e.g., automotive), and an application, what it is used for (e.g., maintenance). In order to better handle the problem of data scarcity, using a variety of technical logbook datasets, this paper investigates the benefits of using transfer learning from sources within the same domain (but different applications), from within the same application (but different domains) and from all available data. Results show that performing transfer learning within a domain provides statistically significant improvements, and in all cases but one the best performance. Interestingly, transfer learning from within the application or across the global dataset degrades results in all cases but one, which benefited from adding as much data as possible. A further analysis of the dataset similarities shows that the datasets with higher similarity scores performed better in transfer learning tasks, suggesting that this can be utilized to determine the effectiveness of adding a dataset in a transfer learning task for technical logbooks.
2021
pdf
bib
abs
Findings of the 2021 Conference on Machine Translation (WMT21)
Farhad Akhbardeh
|
Arkady Arkhangorodsky
|
Magdalena Biesialska
|
Ondřej Bojar
|
Rajen Chatterjee
|
Vishrav Chaudhary
|
Marta R. Costa-jussa
|
Cristina España-Bonet
|
Angela Fan
|
Christian Federmann
|
Markus Freitag
|
Yvette Graham
|
Roman Grundkiewicz
|
Barry Haddow
|
Leonie Harter
|
Kenneth Heafield
|
Christopher Homan
|
Matthias Huck
|
Kwabena Amponsah-Kaakyire
|
Jungo Kasai
|
Daniel Khashabi
|
Kevin Knight
|
Tom Kocmi
|
Philipp Koehn
|
Nicholas Lourie
|
Christof Monz
|
Makoto Morishita
|
Masaaki Nagata
|
Ajay Nagesh
|
Toshiaki Nakazawa
|
Matteo Negri
|
Santanu Pal
|
Allahsera Auguste Tapo
|
Marco Turchi
|
Valentin Vydrin
|
Marcos Zampieri
Proceedings of the Sixth Conference on Machine Translation
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021.In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories. The taskwas also opened up to additional test suites toprobe specific aspects of translation.
pdf
bib
abs
Handling Extreme Class Imbalance in Technical Logbook Datasets
Farhad Akhbardeh
|
Cecilia Ovesdotter Alm
|
Marcos Zampieri
|
Travis Desell
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Technical logbooks are a challenging and under-explored text type in automated event identification. These texts are typically short and written in non-standard yet technical language, posing challenges to off-the-shelf NLP pipelines. The granularity of issue types described in these datasets additionally leads to class imbalance, making it challenging for models to accurately predict which issue each logbook entry describes. In this paper we focus on the problem of technical issue classification by considering logbook datasets from the automotive, aviation, and facilities maintenance domains. We adapt a feedback strategy from computer vision for handling extreme class imbalance, which resamples the training data based on its error in the prediction process. Our experiments show that with statistical significance this feedback strategy provides the best results for four different neural network models trained across a suite of seven different technical logbook datasets from distinct technical domains. The feedback strategy is also generic and could be applied to any learning problem with substantial class imbalances.
2020
pdf
bib
abs
NLP Tools for Predictive Maintenance Records in MaintNet
Farhad Akhbardeh
|
Travis Desell
|
Marcos Zampieri
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations
Processing maintenance logbook records is an important step in the development of predictive maintenance systems. Logbooks often include free text fields with domain specific terms, abbreviations, and non-standard spelling posing challenges to off-the-shelf NLP pipelines trained on standard contemporary corpora. Despite the importance of this data type, processing predictive maintenance data is still an under-explored topic in NLP. With the goal of providing more datasets and resources to the community, in this paper we present a number of new resources available in MaintNet, a collaborative open-source library and data repository of predictive maintenance language datasets. We describe novel annotated datasets from multiple domains such as aviation, automotive, and facility maintenance domains and new tools for segmentation, spell checking, POS tagging, clustering, and classification.
pdf
bib
abs
MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources
Farhad Akhbardeh
|
Travis Desell
|
Marcos Zampieri
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations
Maintenance record logbooks are an emerging text type in NLP. An important part of them typically consist of free text with many domain specific technical terms, abbreviations, and non-standard spelling and grammar. This poses difficulties for NLP pipelines trained on standard corpora. Analyzing and annotating such documents is of particular importance in the development of predictive maintenance systems, which aim to improve operational efficiency, reduce costs, prevent accidents, and save lives. In order to facilitate and encourage research in this area, we have developed MaintNet, a collaborative open-source library of technical and domain-specific language resources. MaintNet provides novel logbook data from the aviation, automotive, and facility maintenance domains along with tools to aid in their (pre-)processing and clustering. Furthermore, it provides a way to encourage discussion on and sharing of new datasets and tools for logbook data analysis.