2022
pdf
bib
abs
TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively
Izaskun Aldezabal
|
Jose Mari Arriola
|
Arantxa Otegi
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Terminology databases are highly useful for the dissemination of specialized knowledge. In this paper we present TZOS, an online terminology database to work on Basque academic terminology collaboratively. We show how this resource integrates the Communicative Theory of Terminology, together with the methodological matters, how it is connected with real corpus GARATERM, which terminology issues arise when terms are collected and future perspectives. The main objectives of this work are to develop basic tools to research academic registers and make the terminology collected by expert users available to the community. Even though TZOS has been designed for an educational context, its flexible structure makes possible to extend it also to the professional area. In this way, we have built IZIBI-TZOS which is a Civil Engineering oriented version of TZOS. These resources are already publicly available, and the ongoing work is towards the interlinking with other lexical resources by applying linking data principles.
2010
pdf
bib
abs
Building the Basque PropBank
Izaskun Aldezabal
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
|
Ainara Estarrona
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper presents the work that has been carried out to annotate semantic roles in the Basque Dependency Treebank (BDT). We will describe the resources we have used and the way the annotation of 100 verbs has been done. We decide to follow the model proposed in the PropBank project that has been deployed in other languages, such as Chinese, Spanish, Catalan and Russian. The resources used are: an in-house database with syntactic/semantic subcategorization frames for Basque verbs, an English-Basque verb mapping based on Levins classification and the BDT itself. Detailed guidelines for human taggers have been established as a result of this annotation process. In addition, we have characterized the information associated to the semantic tag. Besides, and based on this study, we will define semi-automatic procedures that will facilitate the task of manual annotation for the rest of the verbs of the Treebank. We have also adapted AbarHitz, a tool used in the construction of the BDT, for the task of annotating semantic roles according to the proposed characterization.
2008
pdf
bib
abs
WNTERM: Enriching the MCR with a Terminological Dictionary
Eli Pociello
|
Antton Gurrutxaga
|
Eneko Agirre
|
Izaskun Aldezabal
|
German Rigau
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we describe the methodology and the first steps for the creation of WNTERM (from WordNet and Terminology), a specialized lexicon produced from the merger of the EuroWordNet-based Multilingual Central Repository (MCR) and the Basic Encyclopaedic Dictionary of Science and Technology (BDST). As an example, the ecology domain has been used. The final result is a multilingual (Basque and English) light-weight domain ontology, including taxonomic and other semantic relations among its concepts, which is tightly connected to other wordnets.
2006
pdf
bib
abs
A Preliminary Study for Building the Basque PropBank
Eneko Agirre
|
Izaskun Aldezabal
|
Jone Etxeberria
|
Eli Pociello
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper presents a methodology for adding a layer of semantic annotation to a syntactically annotated corpus of Basque (EPEC), in terms of semantic roles. The proposal we make here is the combination of three resources: the model used in the PropBank project (Palmer et al., 2005), an in-house database with syntactic/semantic subcategorization frames for Basque verbs (Aldezabal, 2004) and the Basque dependency treebank (Aduriz et al., 2003). In order to validate the methodology and to confirm whether the PropBank model is suitable for Basque and our treebank design, we have built lexical entries and labelled all argument and adjuncts occurring in our treebank for 3 Basque verbs. The result of this study has been very positive, and has produced a methodology adapted to the characteristics of the language and the Basque dependency treebank. Another goal of this study was to study whether semi-automatic tagging was possible. The idea is to present the human taggers a pre-tagged version of the corpus. We have seen that many arguments could be automatically tagged with high precision, given only the verbal entries for the verbs and a handful of examples.
pdf
bib
abs
A methodology for the joint development of the Basque WordNet and Semcor
Eneko Agirre
|
Izaskun Aldezabal
|
Jone Etxeberria
|
Eli Izagirre
|
Karmele Mendizabal
|
Eli Pociello
|
Mikel Quintian
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way through the nominal part of the 300.000 word corpus (roughly equivalent to a 500.000 word corpus for English). We present a detailed description of the task, including the main criteria for difficult cases in the edition of the senses and the tagging of the corpus, with special mention to multiword entries. Finally we give a detailed picture of the current figures, as well as an analysis of the agreement rates.
2002
pdf
bib
Learning Argument/Adjunct Dictinction for Basque
Izaskun Aldezabal
|
Maxux Aranzabe
|
Koldo Gojenola
|
Kepa Sarasola
|
Aitziber Atutxa
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition
2000
pdf
bib
A word-grammar based morphological analyzer for agglutinative languages
I. Aduriz
|
E. Agirre
|
I. Aldezabal
|
I. Alegria
|
X. Arregi
|
J. M. Arriola
|
X. Artola
|
K. Gojenola
|
A. Maritxalar
|
K. Sarasola
|
M. Urkia
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics
pdf
bib
A Word-level Morphosyntactic Analyzer for Basque
I. Aduriz
|
E. Agirre
|
I. Aldezabal
|
X. Arregi
|
J. M. Arriola
|
X. Artola
|
K. Gojenola
|
A. Maritxalar
|
K. Sarasola
|
M. Urkia
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
pdf
bib
abs
A Bootstrapping Approach to Parser Development
Izaskun Aldezabal
|
Koldo Gojenola
|
Kepa Sarasola
Proceedings of the Sixth International Workshop on Parsing Technologies
This paper presents a robust parsing system for unrestricted Basque texts. It analyzes a sentence in two stages: a unification-based parser builds basic syntactic units such as NPs, PPs, and sentential complements, while a finite-state parser performs syntactic disambiguation and filtering of the results. The system has been applied to the acquisition of verbal subcategorization information, obtaining 66% recall and 87% precision in the determination of verb subcategorization instances. This information will be later incorporated to the parser, in order to improve its performance.
1999
pdf
bib
Designing spelling correctors for inflected languages using lexical transducers
I. Aldezabal
|
I. Alegria
|
O. Ansa
|
J. M. Arriola
|
N. Ezeiza
Ninth Conference of the European Chapter of the Association for Computational Linguistics