Niraj Aswani


2013

pdf bib
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Kalina Bontcheva | Leon Derczynski | Adam Funk | Mark Greenwood | Diana Maynard | Niraj Aswani
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2010

pdf bib
Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages
Niraj Aswani | Robert Gaizauskas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A considerable amount of work has been put into development of stemmers and morphological analysers. The majority of these approaches use hand-crafted suffix-replacement rules but a few try to discover such rules from corpora. While most of the approaches remove or replace suffixes, there are examples of derivational stemmers which are based on prefixes as well. In this paper we present a rule-based morphological analyser. We propose an approach that takes both prefixes as well as suffixes into account. Given a corpus and a dictionary, our method can be used to obtain a set of suffix-replacement rules for deriving an inflected word’s root form. We developed an approach for the Hindi language but show that the approach is portable, at least to related languages, by adapting it to the Gujarati language. Given that the entire process of developing such a ruleset is simple and fast, our approach can be used for rapid development of morphological analysers and yet it can obtain competitive results with analysers built relying on human authored rules.

pdf bib
English-Hindi Transliteration using Multiple Similarity Metrics
Niraj Aswani | Robert Gaizauskas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present an approach to measure the transliteration similarity of English-Hindi word pairs. Our approach has two components. First we propose a bi-directional mapping between one or more characters in the Devanagari script and one or more characters in the Roman script (pronounced as in English). This allows a given Hindi word written in Devanagari to be transliterated into the Roman script and vice-versa. Second, we present an algorithm for computing a similarity measure that is a variant of Dice’s coefficient measure and the LCSR measure and which also takes into account the constraints needed to match English-Hindi transliterated words. Finally, by evaluating various similarity metrics individually and together under a multiple measure agreement scenario, we show that it is possible to achieve a 0.92 f-measure in identifying English-Hindi word pairs that are transliterations. In order to assess the portability of our approach to other similar languages we adapt our system to the Gujarati language.

2005

pdf bib
A Hybrid Approach to Align Sentences and Words in English-Hindi Parallel Corpora
Niraj Aswani | Robert Gaizauskas
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Aligning Words in English-Hindi Parallel Corpora
Niraj Aswani | Robert Gaizauskas
Proceedings of the ACL Workshop on Building and Using Parallel Texts