Chandresh Maurya


2024

pdf bib
FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN
Ibrahim Said Ahmad | Antonios Anastasopoulos | Ondřej Bojar | Claudia Borg | Marine Carpuat | Roldano Cattoni | Mauro Cettolo | William Chen | Qianqian Dong | Marcello Federico | Barry Haddow | Dávid Javorský | Mateusz Krubiński | Tsz Kim Lam | Xutai Ma | Prashant Mathur | Evgeny Matusov | Chandresh Maurya | John McCrae | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Xing Niu | Atul Kr. Ojha | John Ortega | Sara Papi | Peter Polák | Adam Pospíšil | Pavel Pecina | Elizabeth Salesky | Nivedita Sethiya | Balaram Sarkar | Jiatong Shi | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Alex Waibel | Shinji Watanabe | Patrick Wilken | Petr Zemánek | Rodolfo Zevallos
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 17 teams whose submissions are documented in 27 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

pdf bib
Indic-TEDST: Datasets and Baselines for Low-Resource Speech to Text Translation
Nivedita Sethiya | Saanvi Nair | Chandresh Maurya
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Speech-to-text (ST) task is the translation of speech in a language to text in a different language. It has use cases in subtitling, dubbing, etc. Traditionally, ST task has been solved by cascading automatic speech recognition (ASR) and machine translation (MT) models which leads to error propagation, high latency, and training time. To minimize such issues, end-to-end models have been proposed recently. However, we find that only a few works have reported results of ST models on a limited number of low-resource languages. To take a step further in this direction, we release datasets and baselines for low-resource ST tasks. Concretely, our dataset has 9 language pairs and benchmarking has been done against SOTA ST models. The low performance of SOTA ST models on Indic-TEDST data indicates the necessity of the development of ST models specifically designed for low-resource languages.

2021

pdf bib
How effective is incongruity? Implications for code-mixed sarcasm detection
Aditya Shah | Chandresh Maurya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The presence of sarcasm in conversational systems and social media like chatbots, Facebook, Twitter, etc. poses several challenges for downstream NLP tasks. This is attributed to the fact that the intended meaning of a sarcastic text is contrary to what is expressed. Further, the use of code-mix language to express sarcasm is increasing day by day. Current NLP techniques for code-mix data have limited success due to the use of different lexicon, syntax, and scarcity of labeled corpora. To solve the joint problem of code-mixing and sarcasm detection, we propose the idea of capturing incongruity through sub-word level embeddings learned via fastText. Empirical results show that our proposed model achieves an F1-score on code-mix Hinglish dataset comparable to pretrained multilingual models while training 10x faster and using a lower memory footprint.