2023
pdf
bib
abs
E2E Spoken Entity Extraction for Virtual Agents
Karan Singla
|
Yeon-Jun Kim
|
Srinivas Bangalore
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
In human-computer conversations, extracting entities such as names, street addresses and email addresses from speech is a challenging task. In this paper, we study the impact of fine-tuning pre-trained speech encoders on extracting spoken entities in human-readable form directly from speech without the need for text transcription. We illustrate that such a direct approach optimizes the encoder to transcribe only the entity relevant portions of speech ignoring the superfluous portions such as carrier phrases, or spell name entities. In the context of dialog from an enterprise virtual agent, we demonstrate that the 1-step approach outperforms the typical 2-step approach which first generates lexical transcriptions followed by text-based entity extraction for identifying spoken entities.
pdf
bib
abs
1-step Speech Understanding and Transcription Using CTC Loss
Karan Singla
|
Shahab Jalalv
|
Yeon-Jun Kim
|
Andrej Ljolje
|
Antonio Moreno Daniel
|
Srinivas Bangalore
|
Benjamin Stern
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Recent studies have made some progress in refining end-to-end (E2E) speech recognition encoders by applying Connectionist Temporal Classification (CTC) loss to enhance named entity recognition within transcriptions. However, these methods have been constrained by their exclusive use of the ASCII character set, allowing only a limited array of semantic labels. We propose 1SPU, a 1-step Speech Processing Unit which can recognize speech events (e.g: speaker change) or an NL event (Intent, Emotion) while also transcribing vocal content. It extends the E2E automatic speech recognition (ASR) system’s vocabulary by adding a set of unused placeholder symbols, conceptually akin to the <pad> tokens used in sequence modeling. These placeholders are then assigned to represent semantic events (in form of tags) and are integrated into the transcription process as distinct tokens. We demonstrate notable improvements on the SLUE benchmark and yields results that are on par with those for the SLURP dataset. Additionally, we provide a visual analysis of the system’s proficiency in accurately pinpointing meaningful tokens over time, illustrating the enhancement in transcription quality through the utilization of supplementary semantic tags.
pdf
bib
Combining Pre trained Speech and Text Encoders for Continuous Spoken Language Processing
Karan Singla
|
Mahnoosh Mehrabani
|
Daniel Pressel
|
Ryan Price
|
Bhargav Srinivas Chinnari
|
Yeon-Jun Kim
|
Srinivas Bangalore
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
2020
pdf
bib
abs
Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations
Karan Singla
|
Zhuohao Chen
|
David Atkins
|
Shrikanth Narayanan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Spoken language understanding tasks usually rely on pipelines involving complex processing blocks such as voice activity detection, speaker diarization and Automatic speech recognition (ASR). We propose a novel framework for predicting utterance level labels directly from speech features, thus removing the dependency on first generating transcripts, and transcription free behavioral coding. Our classifier uses a pretrained Speech-2-Vector encoder as bottleneck to generate word-level representations from speech features. This pretrained encoder learns to encode speech features for a word using an objective similar to Word2Vec. Our proposed approach just uses speech features and word segmentation information for predicting spoken utterance-level target labels. We show that our model achieves competitive results to other state-of-the-art approaches which use transcribed text for the task of predicting psychotherapy-relevant behavior codes.
2018
pdf
bib
abs
A Multi-task Approach to Learning Multilingual Representations
Karan Singla
|
Dogan Can
|
Shrikanth Narayanan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
We present a novel multi-task modeling approach to learning multilingual distributed representations of text. Our system learns word and sentence embeddings jointly by training a multilingual skip-gram model together with a cross-lingual sentence similarity model. Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the bilingual corpora alone. Our model shows competitive performance in a standard cross-lingual document classification task. We also show the effectiveness of our method in a limited resource scenario.
2017
pdf
bib
abs
Automatic Community Creation for Abstractive Spoken Conversations Summarization
Karan Singla
|
Evgeny Stepanov
|
Ali Orkan Bayer
|
Giuseppe Carenini
|
Giuseppe Riccardi
Proceedings of the Workshop on New Frontiers in Summarization
Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora.
pdf
bib
abs
SHIHbot: A Facebook chatbot for Sexual Health Information on HIV/AIDS
Jacqueline Brixey
|
Rens Hoegen
|
Wei Lan
|
Joshua Rusow
|
Karan Singla
|
Xusen Yin
|
Ron Artstein
|
Anton Leuski
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
We present the implementation of an autonomous chatbot, SHIHbot, deployed on Facebook, which answers a wide variety of sexual health questions on HIV/AIDS. The chatbot’s response database is com-piled from professional medical and public health resources in order to provide reliable information to users. The system’s backend is NPCEditor, a response selection platform trained on linked questions and answers; to our knowledge this is the first retrieval-based chatbot deployed on a large public social network.
pdf
bib
abs
Linguistic analysis of differences in portrayal of movie characters
Anil Ramakrishna
|
Victor R. Martínez
|
Nikolaos Malandrakis
|
Karan Singla
|
Shrikanth Narayanan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We examine differences in portrayal of characters in movies using psycholinguistic and graph theoretic measures computed directly from screenplays. Differences are examined with respect to characters’ gender, race, age and other metadata. Psycholinguistic metrics are extrapolated to dialogues in movies using a linear regression model built on a set of manually annotated seed words. Interesting patterns are revealed about relationships between genders of production team and the gender ratio of characters. Several correlations are noted between gender, race, age of characters and the linguistic metrics.
2014
pdf
bib
abs
Predicting post-editor profiles from the translation process
Karan Singla
|
David Orrego-Carmona
|
Ashleigh Rhea Gonzales
|
Michael Carl
|
Srinivas Bangalore
Workshop on interactive and adaptive machine translation
The purpose of the current investigation is to predict post-editor profiles based on user behaviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main research goals: We create n-gram models based on user activity and part-of-speech sequences to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.
pdf
bib
Reducing the Impact of Data Sparsity in Statistical Machine Translation
Karan Singla
|
Kunal Sachdeva
|
Srinivas Bangalore
|
Dipti Misra Sharma
|
Diksha Yadav
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
pdf
bib
Exploring System Combination approaches for Indo-Aryan MT Systems
Karan Singla
|
Anupam Singh
|
Nishkarsh Shastri
|
Megha Jhunjhunwala
|
Srinivas Bangalore
|
Dipti Misra Sharma
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants
pdf
bib
SEECAT: ASR & Eye-tracking enabled computer-assisted translation
Mercedes García-Martínez
|
Karan Singla
|
Aniruddha Tammewar
|
Bartolomé Mesa-Lao
|
Ankita Thakur
|
Anusuya M.A.
|
Srinivas Bangalore
|
Michael Carl
Proceedings of the 17th Annual Conference of the European Association for Machine Translation
2012
pdf
bib
Two-stage Approach for Hindi Dependency Parsing Using MaltParser
Naman Jain
|
Karan Singla
|
Aniruddha Tammewar
|
Sambhav Jain
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages