Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification

Alla Chepurova, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev


Abstract
This study explores a method for extending real-world knowledge graphs (specifically, Wikidata) by extracting triplets from texts with the aid of Large Language Models (LLMs). We propose a two-step pipeline that includes the initial extraction of entity candidates, followed by their refinement and linkage to the canonical entities and relations of the knowledge graph. Finally, we utilize Wikidata relation constraints to select only verified triplets. We compare our approach to a model that was fine-tuned on a machine-generated dataset and demonstrate that it performs better on natural data. Our results suggest that LLM-based triplet extraction from texts, with subsequent verification, is a viable method for real-world applications.
Anthology ID:
2024.textgraphs-1.5
Volume:
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dmitry Ustalov, Yanjun Gao, Alexander Panchenko, Elena Tutubalina, Irina Nikishina, Arti Ramesh, Andrey Sakhovskiy, Ricardo Usbeck, Gerald Penn, Marco Valentino
Venues:
TextGraphs | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–77
Language:
URL:
https://aclanthology.org/2024.textgraphs-1.5
DOI:
Bibkey:
Cite (ACL):
Alla Chepurova, Yuri Kuratov, Aydar Bulatov, and Mikhail Burtsev. 2024. Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification. In Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing, pages 61–77, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification (Chepurova et al., TextGraphs-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.textgraphs-1.5.pdf