Nai-Lung Tsao


2016

pdf bib
Word Midas Powered by StringNet: Discovering Lexicogrammatical Constructions in Situ
David Wible | Nai-Lung Tsao
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Adult second language learners face the daunting but underappreciated task of mastering patterns of language use that are neither products of fully productive grammar rules nor frozen items to be memorized. Word Midas, a web browser extention, targets this uncharted territory of lexicogrammar by detecting multiword tokens of lexicogrammatical patterning in real time in situ within the noisy digital texts from the user’s unscripted web browsing or other digital venues. The language model powering Word Midas is StringNet, a densely cross-indexed navigable network of one billion lexicogrammatical patterns of English. These resources are described and their functionality is illustrated with a detailed scenario.

2013

pdf bib
A Corpus-Based Tool for Exploring Domain-Specific Collocations in English
Ping-Yu Huang | Chien-Ming Chen | Nai-Lung Tsao | David Wible
PACLIC 27 Workshop on Computer-Assisted Language Learning

pdf bib
Word similarity using constructions as contextual features
Nai-Lung Tsao | David Wible
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2011

pdf bib
The StringNet Lexico-Grammatical Knowledgebase and its Applications
David Wible | Nai-Lung Tsao
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

2010

pdf bib
StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions
David Wible | Nai-Lung Tsao
Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics

2009

pdf bib
Automated Suggestions for Miscollocations
Anne Li-E Liu | David Wible | Nai-Lung Tsao
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
A Method for Unsupervised Broad-Coverage Lexical Error Detection and Correction
Nai-Lung Tsao | David Wible
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

2004

pdf bib
Improving Collocation Extraction for High Frequency Words
David Wible | Chin-Hwa Kuo | Nai-Lung Tsao
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The purpose of this paper is to introduce an alternative word association measure aimed at addressing the under-extraction collocations that contain high frequency words. While measures such as MI provide the important contribution of filtering out sheer high frequency of words in the detection of collocations in large corpora, one side effect of this filtering is that it becomes correspondingly difficult for such measures to detect true collocations involving high frequency words. As an alternative, we propose normalizing the MI measure by dividing the frequency of a candidate lexeme by the number of senses of that lexeme. We premise this alternative approach on the one sense per collocation assumption of Yarowsky (1992; 1995). Ten verb-noun collocations involving three high frequency verbs (make, take, run) are used to compare the extraction results of traditional MI and the proposed normalized MI. Results show the ranking of these high-frequency verbs as candidate collocates with the target focal nouns is raised by normalizing MI as proposed. Side effects of these improved rankings are discussed, such as increase in false positives resulting from higher recall. It is found that overall rank precision remains quite stable even with the increased recall of normalized MI.