Michael Rosner

Also published as: M. Rosner, M.A. Rosner, Mike Rosner


pdf bib
From Linguistic Linked Data to Big Data
Dimitar Trajanov | Elena Apostol | Radovan Garabik | Katerina Gkirtzou | Dagmar Gromann | Chaya Liebeskind | Cosimo Palma | Michael Rosner | Alexia Sampri | Gilles Sérasset | Blerina Spahiu | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With advances in the field of Linked (Open) Data (LOD), language data on the LOD cloud has grown in number, size, and variety. With an increased volume and variety of language data, optimizations of methods for distributing, storing, and querying these data become more central. To this end, this position paper investigates use cases at the intersection of LLOD and Big Data, existing approaches to utilizing Big Data techniques within the context of linked data, and discusses the challenges and benefits of this union.

pdf bib
Linguistic LOD for Interoperable Morphological Description
Michael Rosner | Maxim Ionov
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024

Interoperability is a characteristic of a product or system that seamlessly works with another product or system and implies a certain level of independence from the context of use. Turning to language resources, interoperability is frequently cited as one important rationale underlying the use of LLOD representations and is generally regarded as highly desirable. In this paper we further elaborate this theme, distinguishing three different kinds of interoperability providing practical implementations with examples from morphology.


pdf bib
Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese
Maxim Ionov | Mike Rosner
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
A Linked Data Approach for linking and aligning Sign Language and Spoken Language Data
Thierry Declerck | Sam Bigeard | Fahad Khan | Irene Murtagh | Sussi Olsen | Mike Rosner | Ineke Schuurman | Andon Tchechmedjiev | Andy Way
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages

We present work dealing with a Linked Open Data (LOD)-compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual alignment of SL data and their linking to Spoken Language (SpL) data. The proposed representation is based on activities of groups of researchers in the field of SL who have investigated the use of Open Multilingual Wordnet (OMW) datasets for (manually) cross-linking SL data or for linking SL and SpL data. Another group of researchers is proposing an XML encoding of articulatory elements of SLs and (manually) linking those to an SpL lexical resource. We propose an RDF-based representation of those various data. This unified formal representation offers a semantic repository of information on SL and SpL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications dealing with SLs, thinking for example of Machine Translation (MT) between SLs and between SLs and SpLs.


pdf bib
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.


pdf bib
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm | Katrin Marheinecke | Stefanie Hegele | Stelios Piperidis | Kalina Bontcheva | Jan Hajič | Khalid Choukri | Andrejs Vasiļjevs | Gerhard Backfried | Christoph Prinz | José Manuel Gómez-Pérez | Luc Meertens | Paul Lukowicz | Josef van Genabith | Andrea Lösch | Philipp Slusallek | Morten Irgens | Patrick Gatellier | Joachim Köhler | Laure Le Bars | Dimitra Anastasiou | Albina Auksoriūtė | Núria Bel | António Branco | Gerhard Budin | Walter Daelemans | Koenraad De Smedt | Radovan Garabík | Maria Gavriilidou | Dagmar Gromann | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Jan Odijk | Maciej Ogrodniczuk | Eiríkur Rögnvaldsson | Mike Rosner | Bolette Pedersen | Inguna Skadiņa | Marko Tadić | Dan Tufiș | Tamás Váradi | Kadri Vider | Andy Way | François Yvon
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

pdf bib
Word Probability Findings in the Voynich Manuscript
Colin Layfield | Lonneke van der Plas | Michael Rosner | John Abela
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

The Voynich Manuscript has baffled scholars for centuries. Some believe the elaborate 15th century codex to be a hoax whilst others believe it is a real medieval manuscript whose contents are as yet unknown. In this paper, we provide additional evidence that the text of the manuscript displays the hallmarks of a proper natural language with respect to the relationship between word probabilities and (i) average information per subword segment and (ii) the relative positioning of consecutive subword segments necessary to uniquely identify words of different probabilities.

pdf bib
The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project
Ēriks Ajausks | Victoria Arranz | Laurent Bié | Aleix Cerdà-i-Cucó | Khalid Choukri | Montse Cuadros | Hans Degroote | Amando Estela | Thierry Etchegoyhen | Mercedes García-Martínez | Aitor García-Pablos | Manuel Herranz | Alejandro Kohan | Maite Melero | Mike Rosner | Roberts Rozis | Patrick Paroubek | Artūrs Vasiļevskis | Pierre Zweigenbaum
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages. It will be developed since January 2020 until December 2021.


pdf bib
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions
Albert Gatt | Marc Tanti | Adrian Muscat | Patrizia Paggio | Reuben A Farrugia | Claudia Borg | Kenneth P Camilleri | Michael Rosner | Lonneke van der Plas
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
Survey: Multiword Expression Processing: A Survey
Mathieu Constant | Gülşen Eryiǧit | Johanna Monti | Lonneke van der Plas | Carlos Ramisch | Michael Rosner | Amalia Todirascu
Computational Linguistics, Volume 43, Issue 4 - December 2017

Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by “MWE processing,” distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives.


pdf bib
Obituary: In Memoriam: Susan Armstrong
Pierrette Bouillon | Paola Merlo | Gertjan van Noord | Mike Rosner
Computational Linguistics, Volume 42, Issue 2 - June 2016


pdf bib
A Framework for the Generation of Computer System Diagnostics in Natural Language using Finite State Methods
Rachel Farrell | Gordon Pace | Michael Rosner
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)


pdf bib
Automatic Methods for the Extension of a Bilingual Dictionary using Comparable Corpora
Michael Rosner | Kurt Sultana
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Bilingual dictionaries define word equivalents from one language to another, thus acting as an important bridge between languages. No bilingual dictionary is complete since languages are in a constant state of change. Additionally, dictionaries are unlikely to achieve complete coverage of all language terms. This paper investigates methods for extending dictionaries using non-aligned corpora, by finding translations through context similarity. Most methods compute word contexts from general corpora. This can lead to errors due to data sparsity. We investigate the hypothesis that this problem can be addressed by carefully choosing smaller corpora in which domain-specific terms are more predominant. We also introduce the notion of efficiency which we consider as the effort required to obtain a set of dictionary entries from a given corpus

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.


pdf bib
UoM: Using Explicit Semantic Analysis for Classifying Sentiments
Sapna Negi | Michael Rosner
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)


pdf bib
Incorporating an Error Corpus into a Spellchecker for Maltese
Michael Rosner | Albert Gatt | Andrew Attard | Jan Joachimsen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.


Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Nicoletta Calzolari | Khalid Choukri | Bente Maegaard | Joseph Mariani | Jan Odijk | Stelios Piperidis | Mike Rosner | Daniel Tapias
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Automatic Grammar Rule Extraction and Ranking for Definitions
Claudia Borg | Mike Rosner | Gordon J. Pace
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Plain text corpora contain much information which can only be accessed through human annotation and semantic analysis, which is typically very time consuming to perform. Analysis of such texts at a syntactic or grammatical structure level can however extract some of this information in an automated manner, even if identifying effective rules can be extremely difficult. One such type of implicit information present in texts is that of definitional phrases and sentences. In this paper, we investigate the use of evolutionary algorithms to learn classifiers to discriminate between definitional and non-definitional sentences in non-technical texts, and show how effective grammar-based definition discriminators can be automatically learnt with minor human intervention.


pdf bib
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Mike Rosner | Shuly Wintner
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

pdf bib
LEXIE – an Experiment in Lexical Information Extraction
John J. Camilleri | Michael Rosner
Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains

pdf bib
Evolutionary Algorithms for Definition Extraction
Claudia Borg | Mike Rosner | Gordon Pace
Proceedings of the 1st Workshop on Definition Extraction


pdf bib
MultiSum: Query-Based Multi-Document Summarization
Mike Rosner | Carl Camilleri
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization

pdf bib
ODL: an Object Description Language for Lexical Information
Michael Rosner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes ODL, a description language for lexical information that is being developed within the context of a national project called MLRS (Maltese Language Resource Server) whose goal is to create a national corpus and computational lexicon for the Maltese language. The main aim of ODL is to make the task of the lexicographer easier by allowing lexical specifications to be set out formally so that actual entries will conform to them. The paper describes some of the background motivation, the ODL language itself, and concludes with a short example of how lexical values expressed in ODL can be mapped to an existing tagset together with some speculations about future work.


pdf bib
Language Technology from a European Perspective
Hans Uszkoreit | Valia Kordoni | Vladislav Kubon | Michael Rosner | Sabine Kirchmeier-Andersen
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL


pdf bib
The Future of Maltilex
Michael Rosner
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


pdf bib
Mike Rosner
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources


pdf bib
Maltilex: A Computational Lexicon for Maltese
M. Rosner | J. Caruana | R. Fabri
Computational Approaches to Semitic Languages


pdf bib
A rich environment for experimentation with unification grammars
R. Johnson | M. Rosner
Fourth Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
The <C,A>,T Framework in Eurotra: A Theoretically Committed Notation for MT
D.J. Arnold | S. Krauwer | M. Rosner | L. des Tombe | G.B. Varile
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics


A MUl View of the <C,A>, T Framework in EUROTRA
Doug Arnold | Lieven Jaspaert | Rod Johnson | Steven Krauwer | Mike Rosner | Louis des Tombe | Nino Varile | Susan Warwick
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

A Preliminary Linguistic Framework for EUROTRA, June 1985
Louis des Tombe | Doug Arnold | Lieven Jaspaert | Rod Johnson | Steven Krauwer | Mike Rosner | Nino Varile | Susan Warwick
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages


pdf bib
The Design of the Kernel Architecture for the Eurotra Software
R.L. Johnson | M.A. Rosner
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics