2021
pdf
bib
abs
Domain Adaptation for Hindi-Telugu Machine Translation Using Domain Specific Back Translation
Hema Ala
|
Vandan Mujadia
|
Dipti Sharma
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
In this paper, we present a novel approachfor domain adaptation in Neural MachineTranslation which aims to improve thetranslation quality over a new domain. Adapting new domains is a highly challeng-ing task for Neural Machine Translation onlimited data, it becomes even more diffi-cult for technical domains such as Chem-istry and Artificial Intelligence due to spe-cific terminology, etc. We propose DomainSpecific Back Translation method whichuses available monolingual data and gen-erates synthetic data in a different way. This approach uses Out Of Domain words. The approach is very generic and can beapplied to any language pair for any domain. We conduct our experiments onChemistry and Artificial Intelligence do-mains for Hindi and Telugu in both direc-tions. It has been observed that the usageof synthetic data created by the proposedalgorithm improves the BLEU scores significantly.
2020
pdf
bib
abs
Automatic Technical Domain Identification
Hema Ala
|
Dipti Sharma
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
In this paper we present two Machine Learning algorithms namely Stochastic Gradient Descent and Multi Layer Perceptron to Identify the technical domain of given text as such text provides information about the specific domain. We performed our experiments on Coarse-grained technical domains like Computer Science, Physics, Law, etc for English, Bengali, Gujarati, Hindi, Malayalam, Marathi, Tamil, and Telugu languages, and on fine-grained sub domains for Computer Science like Operating System, Computer Network, Database etc for only English language. Using TFIDF as a feature extraction method we show how both the machine learning models perform on the mentioned languages.
pdf
bib
abs
Graph Based Automatic Domain Term Extraction
Hema Ala
|
Dipti Sharma
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task
We present a Graph Based Approach to automatically extract domain specific terms from technical domains like Biochemistry, Communication, Computer Science and Law. Our approach is similar to TextRank with an extra post-processing step to reduce the noise. We performed our experiments on the mentioned domains provided by ICON TermTraction - 2020 shared task. Presented precision, recall and f1-score for all experiments. Further, it is observed that our method gives promising results without much noise in domain terms.
pdf
bib
abs
AdapNMT : Neural Machine Translation with Technical Domain Adaptation for Indic Languages
Hema Ala
|
Dipti Sharma
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task
Adapting new domain is highly challenging task for Neural Machine Translation (NMT). In this paper we show the capability of general domain machine translation when translating into Indic languages (English - Hindi , English - Telugu and Hindi - Telugu), and low resource domain adaptation of MT systems using existing general parallel data and small in domain parallel data for AI and Chemistry Domains. We carried out our experiments using Byte Pair Encoding(BPE) as it solves rare word problems. It has been observed that with addition of little amount of in-domain data to the general data improves the BLEU score significantly.