2024
pdf
bib
Version Control for Speech Corpora
Vlad Dumitru
|
Matthias Boehm
|
Martin Hagmüller
|
Barbara Schuppler
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
pdf
bib
Towards Improving ASR Outputs of Spontaneous Speech with LLMs
Karner Manuel
|
Julian Linke
|
Mark Kröll
|
Barbara Schuppler
|
Bernhard C. Geiger
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
2022
pdf
bib
abs
Conversational Speech Recognition Needs Data? Experiments with Austrian German
Julian Linke
|
Philip N. Garner
|
Gernot Kubin
|
Barbara Schuppler
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker and inter-conversation variation. This serves to guide where future research might best be focused in light of the current state-of-the-art.
pdf
bib
abs
To laugh or not to laugh? The use of laughter to mark discourse structure
Bogdan Ludusan
|
Barbara Schuppler
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
A number of cues, both linguistic and non-linguistic, have been found to mark discourse structure in conversation. This paper investigates the role of laughter, one of the most encountered non-verbal vocalizations in human communication, in the signalling of turn boundaries. We employ a corpus of informal dyadic conversations to determine the likelihood of laughter at the end of speaker turns and to establish the potential role of laughter in discourse organization. Our results show that, on average, about 10% of the turns are marked by laughter, but also that the marking is subject to individual variation, as well as effects of other factors, such as the type of relationship between speakers. More importantly, we find that turn ends are twice more likely than transition relevance places to be marked by laughter, suggesting that, indeed, laughter plays a role in marking discourse structure.
2021
pdf
bib
Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System
Barbara Schuppler
|
Anneliese Kelterer
Proceedings of the First Workshop on Integrating Perspectives on Discourse Annotation
2020
pdf
bib
abs
Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu
Alexander Zahrer
|
Andrej Zgank
|
Barbara Schuppler
Proceedings of the Twelfth Language Resources and Evaluation Conference
Since at least half of the world’s 6000 plus languages will vanish during the 21st century, language documentation has become a rapidly growing field in linguistics. A fundamental challenge for language documentation is the ”transcription bottleneck”. Speech technology may deliver the decisive breakthrough for overcoming the transcription bottleneck. This paper presents first experiments from the development of ASR4LD, a new automatic speech recognition (ASR) based tool for language documentation (LD). The experiments are based on recordings from an ongoing documentation project for the endangered Muyu language in New Guinea. We compare phoneme recognition experiments with American English, Austrian German and Slovenian as source language and Muyu as target language. The Slovenian acoustic models achieve the by far best performance (43.71% PER) in comparison to 57.14% PER with American English, and 89.49% PER with Austrian German. Whereas part of the errors can be explained by phonetic variation, the recording mismatch poses a major problem. On the long term, ASR4LD will not only be an integral part of the ongoing documentation project of Muyu, but will be further developed in order to facilitate also the language documentation process of other language groups.
2014
pdf
bib
abs
GRASS: the Graz corpus of Read And Spontaneous Speech
Barbara Schuppler
|
Martin Hagmueller
|
Juan A. Morales-Cordovilla
|
Hannes Pessentheiner
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper provides a description of the preparation, the speakers, the recordings, and the creation of the orthographic transcriptions of the first large scale speech database for Austrian German. It contains approximately 1900 minutes of (read and spontaneous) speech produced by 38 speakers. The corpus consists of three components. First, the Conversation Speech (CS) component contains free conversations of one hour length between friends, colleagues, couples, or family members. Second, the Commands Component (CC) contains commands and keywords which were either read or elicited by pictures. Third, the Read Speech (RS) component contains phonetically balanced sentences and digits. The speech of all components has been recorded at super-wideband quality in a soundproof recording-studio with head-mounted microphones, large-diaphragm microphones, a laryngograph, and with a video camera. The orthographic transcriptions, which have been created and subsequently corrected manually, contain approximately 290 000 word tokens from 15 000 different word types.