Yurii Kuratov

Also published as: Yuri Kuratov


2024

pdf bib
Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification
Alla Chepurova | Yuri Kuratov | Aydar Bulatov | Mikhail Burtsev
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing

This study explores a method for extending real-world knowledge graphs (specifically, Wikidata) by extracting triplets from texts with the aid of Large Language Models (LLMs). We propose a two-step pipeline that includes the initial extraction of entity candidates, followed by their refinement and linkage to the canonical entities and relations of the knowledge graph. Finally, we utilize Wikidata relation constraints to select only verified triplets. We compare our approach to a model that was fine-tuned on a machine-generated dataset and demonstrate that it performs better on natural data. Our results suggest that LLM-based triplet extraction from texts, with subsequent verification, is a viable method for real-world applications.

2023

pdf bib
Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information
Alla Chepurova | Aydar Bulatov | Yuri Kuratov | Mikhail Burtsev
Findings of the Association for Computational Linguistics: EMNLP 2023

Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based language models (e.g., T5 and recent KGT5) offer a promising solution as they can predict the tail nodes directly. In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models. We examine the effects of this imputation and show that, on both inductive and transductive Wikidata subsets, our method outperforms KGT5 and conventional KGC approaches. We also provide an extensive analysis of the impact of neighborhood on model prediction and show its importance. Furthermore, we point the way to significantly improve KGC through more effective neighborhood selection.

2019

pdf bib
Tuning Multilingual Transformers for Language-Specific Named Entity Recognition
Mikhail Arkhipov | Maria Trofimova | Yuri Kuratov | Alexey Sorokin
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

Our paper addresses the problem of multilingual named entity recognition on the material of 4 languages: Russian, Bulgarian, Czech and Polish. We solve this task using the BERT model. We use a hundred languages multilingual model as base for transfer to the mentioned Slavic languages. Unsupervised pre-training of the BERT model on these 4 languages allows to significantly outperform baseline neural approaches and multilingual BERT. Additional improvement is achieved by extending BERT with a word-level CRF layer. Our system was submitted to BSNLP 2019 Shared Task on Multilingual Named Entity Recognition and demonstrated top performance in multilingual setting for two competition metrics. We open-sourced NER models and BERT model pre-trained on the four Slavic languages.

2018

pdf bib
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Idris Yusupov | Yurii Kuratov
Proceedings of the 27th International Conference on Computational Linguistics

We present bot#1337: a dialog system developed for the 1st NIPS Conversational Intelligence Challenge 2017 (ConvAI). The aim of the competition was to implement a bot capable of conversing with humans based on a given passage of text. To enable conversation, we implemented a set of skills for our bot, including chit-chat, topic detection, text summarization, question answering and question generation. The system has been trained in a supervised setting using a dialogue manager to select an appropriate skill for generating a response. The latter allows a developer to focus on the skill implementation rather than the finite state machine based dialog manager. The proposed system bot#1337 won the competition with an average dialogue quality score of 2.78 out of 5 given by human evaluators. Source code and trained models for the bot#1337 are available on GitHub.

pdf bib
DeepPavlov: Open-Source Library for Dialogue Systems
Mikhail Burtsev | Alexander Seliverstov | Rafael Airapetyan | Mikhail Arkhipov | Dilyara Baymurzina | Nickolay Bushkov | Olga Gureenkova | Taras Khakhulin | Yuri Kuratov | Denis Kuznetsov | Alexey Litinsky | Varvara Logacheva | Alexey Lymar | Valentin Malykh | Maxim Petrov | Vadim Polulyakh | Leonid Pugachev | Alexey Sorokin | Maria Vikhreva | Marat Zaynutdinov
Proceedings of ACL 2018, System Demonstrations

Adoption of messaging communication and voice assistants has grown rapidly in the last years. This creates a demand for tools that speed up prototyping of feature-rich dialogue systems. An open-source library DeepPavlov is tailored for development of conversational agents. The library prioritises efficiency, modularity, and extensibility with the goal to make it easier to develop dialogue systems from scratch and with limited data available. It supports modular as well as end-to-end approaches to implementation of conversational agents. Conversational agent consists of skills and every skill can be decomposed into components. Components are usually models which solve typical NLP tasks such as intent classification, named entity recognition or pre-trained word vectors. Sequence-to-sequence chit-chat skill, question answering skill or task-oriented skill can be assembled from components provided in the library.