2023
pdf
bib
abs
Incorporating Human Translator Style into English-Turkish Literary Machine Translation
Zeynep Yirmibeşoğlu
|
Olgun Dursun
|
Harun Dalli
|
Mehmet Şahin
|
Ena Hodzik
|
Sabri Gürses
|
Tunga Güngör
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator.
2022
pdf
bib
abs
BOUN-TABI@SMM4H’22: Text-to-Text Adverse Drug Event Extraction with Data Balancing and Prompting
Gökçe Uludoğan
|
Zeynep Yirmibeşoğlu
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
This paper describes models developed for the Social Media Mining for Health 2022 Shared Task. We participated in two subtasks: classification of English tweets reporting adverse drug events (ADE) (Task 1a) and extraction of ADE spans in such tweets (Task 1b). We developed two separate systems based on the T5 model, viewing these tasks as sequence-to-sequence problems. To address the class imbalance, we made use of data balancing via over- and undersampling on both tasks. For the ADE extraction task, we explored prompting to further benefit from the T5 model and its formulation. Additionally, we built an ensemble model, utilizing both balanced and prompted models. The proposed models outperformed the current state-of-the-art, with an F1 score of 0.655 on ADE classification and a Partial F1 score of 0.527 on ADE extraction.
2020
pdf
bib
abs
ERMI at PARSEME Shared Task 2020: Embedding-Rich Multiword Expression Identification
Zeynep Yirmibeşoğlu
|
Tunga Güngör
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
This paper describes the ERMI system submitted to the closed track of the PARSEME shared task 2020 on automatic identification of verbal multiword expressions (VMWEs). ERMI is an embedding-rich bidirectional LSTM-CRF model, which takes into account the embeddings of the word, its POS tag, dependency relation, and its head word. The results are reported for 14 languages, where the system is ranked 1st in the general cross-lingual ranking of the closed track systems, according to the Unseen MWE-based F1.
2018
pdf
bib
abs
Detecting Code-Switching between Turkish-English Language Pair
Zeynep Yirmibeşoğlu
|
Gülşen Eryiğit
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Code-switching (usage of different languages within a single conversation context in an alternative manner) is a highly increasing phenomenon in social media and colloquial usage which poses different challenges for natural language processing. This paper introduces the first study for the detection of Turkish-English code-switching and also a small test data collected from social media in order to smooth the way for further studies. The proposed system using character level n-grams and conditional random fields (CRFs) obtains 95.6% micro-averaged F1-score on the introduced test data set.