Un outil multilingue d’extraction de collocations en ligne (This demo shows the web version of a multilingual collocation extraction tool)
Luka Nerima
Violeta Seretan
Eric Wehrli
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations
Cette démonstration présente la version web d’un outil multilingue d’extraction de collocations. Elle est destinée aux lexicographes, aux traducteurs, aux enseignants et apprenants L2 et, plus généralement, aux linguistes désireux d’analyser et d’exploiter leurs propres corpus.
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon
Johanna Gerlach
Asheesh Gulati
Victoria Porro
Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon
Johanna Gerlach
Asheesh Gulati
Victoria Porro
Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation
Violeta Seretan
Pierrette Bouillon
Johanna Gerlach
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the two domains considered by the project, namely, the technical domain and the healthcare domain. In this paper, we present the evaluation experiments carried out to assess the impact of the proposed pre-editing rules on translation quality. Results from a large-scale evaluation campaign show that pre-editing helps indeed attain a better translation quality for a high proportion of the data, the difference with the number of cases where the adverse effect is observed being statistically significant. The ACCEPT pre-editing technology is freely available online and can be used in any Web-based environment to enhance the translatability of user-generated content so that it reaches a broader audience.
Rule-based automatic post-processing of SMT output to reduce human post-editing effort
Victoria Porro
Johanna Gerlach
Pierrette Bouillon
Violeta Seretan
Proceedings of Translating and the Computer 36
The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content
Violeta Seretan
Johann Roturier
David Silva
Pierrette Bouillon
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
On translating syntactically-flexible expressions
Violeta Seretan
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies
Acquisition of Syntactic Simplification Rules for French
Violeta Seretan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Text simplification is the process of reducing the lexical and syntactic complexity of a text while attempting to preserve (most of) its information content. It has recently emerged as an important research area, which holds promise for enhancing the text readability for the benefit of a broader audience as well as for increasing the performance of other applications. Our work focuses on syntactic complexity reduction and deals with the task of corpus-based acquisition of syntactic simplification rules for the French language. We show that the data-driven manual acquisition of simplification rules can be complemented by the semi-automatic detection of syntactic constructions requiring simplification. We provide the first comprehensive set of syntactic simplification rules for French, whose size is comparable to similar resources that exist for English and Brazilian Portuguese. Unlike these manually-built resources, our resource integrates larger lists of lexical cues signaling simplifiable constructions, that are useful for informing practical systems.
FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
Violeta Seretan
Eric Wehrli
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Une approche de résumé automatique basée sur les collocations (A Collocation-Driven Approach to Text Summarization)
Violeta Seretan
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Dans cet article, nous décrivons une nouvelle approche pour la création de résumés extractifs – tâche qui consiste à créer automatiquement un résumé pour un document en sélectionnant un sous-ensemble de ses phrases – qui exploite des informations collocationnelles spécifiques à un domaine, acquises préalablement à partir d’un corpus de développement. Un extracteur de collocations fondé sur l’analyse syntaxique est utilisé afin d’inférer un modèle de contenu qui est ensuite appliqué au document à résumer. Cette approche a été utilisée pour la création des versions simples pour les articles de Wikipedia en anglais, dans le cadre d’un projet visant la création automatique d’articles simplifiées, similaires aux articles recensées dans Simple English Wikipedia. Une évaluation du système développé reste encore à faire. Toutefois, les résultats préalables obtenus pour les articles sur des villes montrent le potentiel de cette approche guidée par collocations pour la sélection des phrases pertinentes.
Une Suite d’interaction de fouille basée sur la compréhension du langage naturel (An Interaction Mining Suite Based On Natural Language Understanding)
Rodolfo Delmonte
Vincenzo Pallotta
Violeta Seretan
Lammert Vrieling
David Walker
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations
Sentence Analysis and Collocation Identification
Eric Wehrli
Violeta Seretan
Luka Nerima
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications
FipsRomanian: Towards a Romanian Version of the Fips Syntactic Parser
Violeta Seretan
Eric Wehrli
Luka Nerima
Gabriela Soare
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We describe work in progress on the development of a full syntactic parser for Romanian. This work is part of a larger project of multilingual extension of the Fips parser (Wehrli, 2007), already available for French, English, German, Spanish, Italian, and Greek, to four new languages (Romanian, Romansh, Russian and Japanese). The Romanian version was built by starting with the Fips generic parsing architecture for the Romance languages and customising the grammatical component, in close relation to the development of the lexical component. We describe this process and report on preliminary results obtained for journalistic texts.
A Recursive Treatment of Collocations
Luka Nerima
Eric Wehrli
Violeta Seretan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This article discusses the treatment of collocations in the context of a long-term project on the development of multilingual NLP tools. Besides classical two-word collocations, we will focus on the case of complex collocations (3 words or more) for which a recursive design is presented in the form of collocation of collocations. Although comparatively less numerous than two-word collocations, the complex collocations pose important challenges for NLP. The article discusses how these collocations are retrieved from corpora, inserted and stored in a lexical database, how the parser uses such knowledge and what are the advantages offered by a recursive approach to complex collocations.
A Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing
Athina Michou
Violeta Seretan
Proceedings of the Demonstrations Session at EACL 2009
Collocations in a Rule-Based MT System: A Case Study Evaluation of their Translation Adequacy
Eric Wehrli
Violeta Seretan
Luka Nerima
Lorenza Russo
Proceedings of the 13th Annual Conference of the European Association for Machine Translation
Extraction de collocations et leurs équivalents de traduction à partir de corpus parallèles [Extracting collocations and their translations from parallel corpora]
Violeta Seretan
Traitement Automatique des Langues, Volume 50, Numéro 1 : Varia [Varia]
Collocation translation based on sentence alignment and parsing
Violeta Seretan
Éric Wehrli
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Bien que de nombreux efforts aient été déployés pour extraire des collocations à partir de corpus de textes, seule une minorité de travaux se préoccupent aussi de rendre le résultat de l’extraction prêt à être utilisé dans les applications TAL qui pourraient en bénéficier, telles que la traduction automatique. Cet article décrit une méthode précise d’identification de la traduction des collocations dans un corpus parallèle, qui présente les avantages suivants : elle peut traiter des collocation flexibles (et pas seulement figées) ; elle a besoin de ressources limitées et d’un pouvoir de calcul raisonnable (pas d’alignement complet, pas d’entraînement) ; elle peut être appliquée à plusieurs paires des langues et fonctionne même en l’absence de dictionnaires bilingues. La méthode est basée sur l’information syntaxique provenant du parseur multilingue Fips. L’évaluation effectuée sur 4000 collocations de type verbe-objet correspondant à plusieurs paires de langues a montré une précision moyenne de 89.8% et une couverture satisfaisante (70.9%). Ces résultats sont supérieurs à ceux enregistrés dans l’évaluation d’autres méthodes de traduction de collocations.
User Requirements Analysis for Meeting Information Retrieval Based on Query Elicitation
Vincenzo Pallotta
Violeta Seretan
Marita Ailomaa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Proceedings of the ACL 2007 Student Research Workshop
Chris Biemann
Violeta Seretan
Ellen Riloff
Proceedings of the ACL 2007 Student Research Workshop
Multilingual Collocation Extraction: Issues and Solutions
Violeta Seretan
Eric Wehrli
Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Accurate Collocation Extraction Using a Multilingual Parser
Violeta Seretan
Eric Wehrli
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Using the Web as a Corpus for the Syntactic-Based Collocation Identification
Violeta Seretan
Luka Nerima
Eric Wehrli
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Creating a multilingual collocations dictionary from large text corpora
Luka Nerima
Violeta Seretan
Eric Wehrli
10th Conference of the European Chapter of the Association for Computational Linguistics
Creating a multilingual collocations dictionary from large text corpora
Luka Nerima
Violeta Seretan
Eric Wehrli
10th Conference of the European Chapter of the Association for Computational Linguistics
The Use of Referential Constraints in Structuring Discourse
Violeta Seretan
Dan Cristea
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)