2019
pdf
bib
abs
Open Information Extraction from Question-Answer Pairs
Nikita Bhutani
|
Yoshihiko Suhara
|
Wang-Chiew Tan
|
Alon Halevy
|
H. V. Jagadish
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Open Information Extraction (OpenIE) extracts meaningful structured tuples from free-form text. Most previous work on OpenIE considers extracting data from one sentence at a time. We describe NeurON, a system for extracting tuples from question-answer pairs. One of the main motivations for NeurON is to be able to extend knowledge bases in a way that considers precisely the information that users care about. NeurON addresses several challenges. First, an answer text is often hard to understand without knowing the question, and second, relevant information can span multiple sentences. To address these, NeurON formulates extraction as a multi-source sequence-to-sequence learning task, wherein it combines distributed representations of a question and an answer to generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.
2018
pdf
bib
abs
Exploiting Structure in Representation of Named Entities using Active Learning
Nikita Bhutani
|
Kun Qian
|
Yunyao Li
|
H. V. Jagadish
|
Mauricio Hernandez
|
Mitesh Vasa
Proceedings of the 27th International Conference on Computational Linguistics
Fundamental to several knowledge-centric applications is the need to identify named entities from their textual mentions. However, entities lack a unique representation and their mentions can differ greatly. These variations arise in complex ways that cannot be captured using textual similarity metrics. However, entities have underlying structures, typically shared by entities of the same entity type, that can help reason over their name variations. Discovering, learning and manipulating these structures typically requires high manual effort in the form of large amounts of labeled training data and handwritten transformation programs. In this work, we propose an active-learning based framework that drastically reduces the labeled data required to learn the structures of entities. We show that programs for mapping entity mentions to their structures can be automatically generated using human-comprehensible labels. Our experiments show that our framework consistently outperforms both handwritten programs and supervised learning models. We also demonstrate the utility of our framework in relation extraction and entity resolution tasks.
2016
pdf
bib
Nested Propositions in Open Information Extraction
Nikita Bhutani
|
H. V. Jagadish
|
Dragomir Radev
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2008
pdf
bib
Regular Expression Learning for Information Extraction
Yunyao Li
|
Rajasekar Krishnamurthy
|
Sriram Raghavan
|
Shivakumar Vaithyanathan
|
H. V. Jagadish
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing