Koldo Gojenola

Also published as: K. Gojenola, Koldo Gojenola Galletebeitia, Koldobika Gojenola


2024

pdf bib
A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records
Janire Arana | Mikel Idoyaga | Maitane Urruela | Elisa Espina | Aitziber Atutxa Salazar | Koldo Gojenola
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this work we present two datasets for the development of virtual patients and the first evaluation results. We firstly introduce a Spanish corpus of medical dialogue questions annotated with intents, built upon prior research in French. We also propose a second dataset of dialogues using a novel annotation approach that involves doctor questions, patient answers, and corresponding clinical records, organized as triples of the form (clinical report, question, patient answer). This way, the doctor-patient conversation is modeled as a question-answering system that tries to find responses to questions taking a clinical record as input. This approach can help to eliminate the need for manually structured patient records, as commonly used in previous studies, thereby expanding the pool of diverse virtual patients available. Leveraging these annotated corpora, we develop and assess an automatic system designed to answer medical dialogue questions posed by medical students to simulated patients in medical exams. Our approach demonstrates robust generalization, relying solely on medical records to generate new patient cases. The two datasets and the code will be freely available for the research community.

2019

pdf bib
IxaMed at PharmacoNER Challenge 2019
Xabier Lahuerta | Iakes Goenaga | Koldo Gojenola | Aitziber Atutxa Salazar | Maite Oronoz
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

The aim of this paper is to present our approach (IxaMed) in the PharmacoNER 2019 task. The task consists of identifying chemical, drug, and gene/protein mentions from clinical case studies written in Spanish. The evaluation of the task is divided in two scenarios: one corresponding to the detection of named entities and one corresponding to the indexation of named entities that have been previously identified. In order to identify named entities we have made use of a Bi-LSTM with a CRF on top in combination with different types of word embeddings. We have achieved our best result (86.81 F-Score) combining pretrained word embeddings of Wikipedia and Electronic Health Records (50M words) with contextual string embeddings of Wikipedia and Electronic Health Records. On the other hand, for the indexation of the named entities we have used the Levenshtein distance obtaining a 85.34 F-Score as our best result.

pdf bib
Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Discourse information is crucial for a better understanding of the text structure and it is also necessary to describe which part of an opinionated text is more relevant or to decide how a text span can change the polarity (strengthen or weaken) of other span by means of coherence relations. This work presents the first results on the annotation of the Basque Opinion Corpus using Rhetorical Structure Theory (RST). Our evaluation results and analysis show us the main avenues to improve on a future annotation process. We have also extracted the subjectivity of several rhetorical relations and the results show the effect of sentiment words in relations and the influence of each relation in the semantic orientation value.

2018

pdf bib
Saying no but meaning yes: negation and sentiment analysis in Basque
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this work, we have analyzed the effects of negation on the semantic orientation in Basque. The analysis shows that negation markers can strengthen, weaken or have no effect on sentiment orientation of a word or a group of words. Using the Constraint Grammar formalism, we have designed and evaluated a set of linguistic rules to formalize these three phenomena. The results show that two phenomena, strengthening and no change, have been identified accurately and the third one, weakening, with acceptable results.

2017

pdf bib
Using lexical level information in discourse structures for Basque sentiment analysis
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

2016

pdf bib
The impact of simple feature engineering in multilingual medical NER
Rebecka Weegar | Arantza Casillas | Arantza Diaz de Ilarraza | Maite Oronoz | Alicia Pérez | Koldo Gojenola
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.

pdf bib
Fully unsupervised low-dimensional representation of adverse drug reaction events through distributional semantics
Alicia Pérez | Arantza Casillas | Koldo Gojenola
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

Electronic health records show great variability since the same concept is often expressed with different terms, either scientific latin forms, common or lay variants and even vernacular naming. Deep learning enables distributional representation of terms in a vector-space, and therefore, related terms tend to be close in the vector space. Accordingly, embedding words through these vectors opens the way towards accounting for semantic relatedness through classical algebraic operations. In this work we propose a simple though efficient unsupervised characterization of Adverse Drug Reactions (ADRs). This approach exploits the embedding representation of the terms involved in candidate ADR events, that is, drug-disease entity pairs. In brief, the ADRs are represented as vectors that link the drug with the disease in their context through a recursive additive model. We discovered that a low-dimensional representation that makes use of the modulus and argument of the embedded representation of the ADR event shows correlation with the manually annotated class. Thus, it can be derived that this characterization results in to be beneficial for further classification tasks as predictive features.

2014

pdf bib
On WordNet Semantic Classes and Dependency Parsing
Kepa Bengoetxea | Eneko Agirre | Joakim Nivre | Yue Zhang | Koldo Gojenola
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
IxaMed: Applying Freeling and a Perceptron Sequential Tagger at the Shared Task on Analyzing Clinical Texts
Koldo Gojenola | Maite Oronoz | Alicia Pérez | Arantza Casillas
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Adverse Drug Event prediction combining shallow analysis and machine learning
Sara Santiso | Arantza Casillas | Alicia Pérez | Maite Oronoz | Koldo Gojenola
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

2013

pdf bib
Exploiting the Contribution of Morphological Information to Parsing: the BASQUE TEAM system in the SPRML‘2013 Shared Task
Iakes Goenaga | Koldo Gojenola | Nerea Ezeiza
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
Combining Rule-Based and Statistical Syntactic Analyzers
Iakes Goenaga | Koldobika Gojenola | María Jesús Aranzabe | Arantza Díaz de Ilarraza | Kepa Bengoetxea
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf bib
First Approaches on Spanish Medical Record Classification Using Diagnostic Term to Class Transduction
A. Casillas | A. Díaz de Ilarraza | K. Gojenola | M. Oronoz | Alicia Pérez
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

2011

pdf bib
Improving Dependency Parsing with Semantic Classes
Eneko Agirre | Kepa Bengoetxea | Koldo Gojenola | Joakim Nivre
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Using Kybots for Extracting Events in Biomedical Texts
Arantza Casillas | Arantza Díaz de Ilarraza | Koldo Gojenola | Maite Oronoz | German Rigau
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Testing the Effect of Morphological Disambiguation in Dependency Parsing of Basque
Kepa Bengoetxea | Arantza Casillas | Koldo Gojenola
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

2010

pdf bib
Application of Different Techniques to Dependency Parsing of Basque
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

2009

pdf bib
Exploring Treebank Transformations in Dependency Parsing
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the International Conference RANLP-2009

pdf bib
Evaluating the Impact of Morphosyntactic Ambiguity in Grammatical Error Detection
Arantza Díaz de Ilarraza | Koldo Gojenola | Maite Oronoz
Proceedings of the International Conference RANLP-2009

pdf bib
Application of feature propagation to dependency parsing
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib
Detecting Erroneous Uses of Complex Postpositions in an Agglutinative Language
Arantza Díaz de Ilarraza | Koldo Gojenola | Maite Oronoz
Coling 2008: Companion volume: Posters

2004

pdf bib
Exploring Portability of Syntactic Information from English to Basque
Eneko Agirre | Aitziber Atutxa | Koldo Gojenola | Kepa Sarasola
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Representation and Treatment of Multiword Expressions in Basque
Iñaki Alegria | Olatz Ansa | Xabier Artola | Nerea Ezeiza | Koldo Gojenola | Ruben Urizar
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

2002

pdf bib
A Class Library for the Integration of NLP Tools: Definition and implementation of an Abstract Data Type Collection for the manipulation of SGML documents in a context of stand-off linguistic annotation
X. Artola | A. Díaz de Ilarraza | N. Ezeiza | K. Gojenola | G. Hernández | A. Soroa
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Learning Argument/Adjunct Dictinction for Basque
Izaskun Aldezabal | Maxux Aranzabe | Koldo Gojenola | Kepa Sarasola | Aitziber Atutxa
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition

2000

pdf bib
Corpus-Based Syntactic Error Detection Using Syntactic Patterns
Koldo Gojenola | Maite Oronoz
Proceedings of the ANLP-NAACL 2000 Student Research Workshop

pdf bib
A word-grammar based morphological analyzer for agglutinative languages
I. Aduriz | E. Agirre | I. Aldezabal | I. Alegria | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
A Word-level Morphosyntactic Analyzer for Basque
I. Aduriz | E. Agirre | I. Aldezabal | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Proposal for the Integration of NLP Tools using SGML-Tagged Documents
X. Artola | A. Díaz de Ilarraza | N. Ezeiza | K. Gojenola | A. Maritxalar | A. Soroa
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Bootstrapping Approach to Parser Development
Izaskun Aldezabal | Koldo Gojenola | Kepa Sarasola
Proceedings of the Sixth International Workshop on Parsing Technologies

This paper presents a robust parsing system for unrestricted Basque texts. It analyzes a sentence in two stages: a unification-based parser builds basic syntactic units such as NPs, PPs, and sentential complements, while a finite-state parser performs syntactic disambiguation and filtering of the results. The system has been applied to the acquisition of verbal subcategorization information, obtaining 66% recall and 87% precision in the determination of verb subcategorization instances. This information will be later incorporated to the parser, in order to improve its performance.

1998

pdf bib
Towards a single proposal in spelling correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Towards a Single Proposal in Spelling Correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1