2020
pdf
bib
abs
Designing Multilingual Interactive Agents using Small Dialogue Corpora
Donghui Lin
|
Masayuki Otani
|
Ryosuke Okuno
|
Toru Ishida
Proceedings of the Twelfth Language Resources and Evaluation Conference
Interactive dialogue agents like smart speakers have become more and more popular in recent years. These agents are being developed on machine learning technologies that use huge amounts of language resources. However, many entities in specialized fields are struggling to develop their own interactive agents due to a lack of language resources such as dialogue corpora, especially when the end users need interactive agents that offer multilingual support. Therefore, we aim at providing a general design framework for multilingual interactive agents in specialized domains that, it is assumed, have small or non-existent dialogue corpora. To achieve our goal, we first integrate and customize external language services for supporting multilingual functions of interactive agents. Then, we realize context-aware dialogue generation under the situation of small corpora. Third, we develop a gradual design process for acquiring dialogue corpora and improving the interactive agents. We implement a multilingual interactive agent in the field of healthcare and conduct experiments to illustrate the effectiveness of the implemented agent.
2018
pdf
bib
A Framework for Multi-Language Service Design with the Language Grid
Donghui Lin
|
Yohei Murakami
|
Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
Arbi Haza Nasution
|
Yohei Murakami
|
Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
bib
abs
Combining Human Inputters and Language Services to provide Multi-language support system for International Symposiums
Takao Nakaguchi
|
Masayuki Otani
|
Toshiyuki Takasaki
|
Toru Ishida
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)
In this research, we introduce and implement a method that combines human inputters and machine translators. When the languages of the participants vary widely, the cost of simultaneous translation becomes very high. However, the results of simply applying machine translation to speech text do not have the quality that is needed for real use. Thus, we propose a method that people who understand the language of the speaker cooperate with a machine translation service in support of multilingualization by the co-creation of value. We implement a system with this method and apply it to actual presentations. While the quality of direct machine translations is 1.84 (fluency) and 2.89 (adequacy), the system has corresponding values of 3.76 and 3.85.
pdf
bib
abs
An Ontology for Language Service Composability
Yohei Murakami
|
Takao Nakaguchi
|
Donghui Lin
|
Toru Ishida
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)
Fragmentation and recombination is a key to create customized language environments for supporting various intercultural activities. Fragmentation provides various language resource components for the customized language environments and recombination builds each language environment according to user’s request by combining these components. To realize this fragmentation and recombination process, existing language resources (both data and programs) should be shared as language services and combined beyond mismatch of their service interfaces. To address this issue, standardization is inevitable: standardized interfaces are necessary for language services as well as data format required for language resources. Therefore, we have constructed a hierarchy of language services based on inheritance of service interfaces, which is called language service ontology. This ontology allows users to create a new customized language service that is compatible with existing ones. Moreover, we have developed a dynamic service binding technology that instantiates various executable customized services from an abstract workflow according to user’s request. By using the ontology and service binding together, users can bind the instantiated language service to another abstract workflow for a new customized one.
pdf
bib
abs
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
Arbi Haza Nasution
|
Yohei Murakami
|
Toru Ishida
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.
pdf
bib
abs
Towards a Language Service Infrastructure for Mobile Environments
Ngoc Nguyen
|
Donghui Lin
|
Takao Nakaguchi
|
Toru Ishida
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Since mobile devices have feature-rich configurations and provide diverse functions, the use of mobile devices combined with the language resources of cloud environments is high promising for achieving a wide range communication that goes beyond the current language barrier. However, there are mismatches between using resources of mobile devices and services in the cloud such as the different communication protocol and different input and output methods. In this paper, we propose a language service infrastructure for mobile environments to combine these services. The proposed language service infrastructure allows users to use and mashup existing language resources on both cloud environments and their mobile devices. Furthermore, it allows users to flexibly use services in the cloud or services on mobile devices in their composite service without implementing several different composite services that have the same functionality. A case study of Mobile Shopping Translation System using both a service in the cloud (translation service) and services on mobile devices (Bluetooth low energy (BLE) service and text-to-speech service) is introduced.
2014
pdf
bib
abs
Bilingual Dictionary Induction as an Optimization Problem
Wushouer Mairidan
|
Toru Ishida
|
Donghui Lin
|
Katsutoshi Hirayama
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7% accuracy and approximately 70.5% recall, thus outperforming the baseline pivot-based method.
pdf
bib
abs
Crowdsourcing for Evaluating Machine Translation Quality
Shinsuke Goto
|
Donghui Lin
|
Toru Ishida
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.
pdf
bib
abs
Integration of Workflow and Pipeline for Language Service Composition
Trang Mai Xuan
|
Yohei Murakami
|
Donghui Lin
|
Toru Ishida
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Integrating language resources and language services is a critical part of building natural language processing applications. Service workflow and processing pipeline are two approaches for sharing and combining language resources. Workflow languages focus on expressive power of the languages to describe variety of workflow patterns to meet users’ needs. Users can combine those language services in service workflows to meet their requirements. The workflows can be accessible in distributed manner and can be invoked independently of the platforms. However, workflow languages lack of pipelined execution support to improve performance of workflows. Whereas, the processing pipeline provides a straightforward way to create a sequence of linguistic processing to analyze large amounts of text data. It focuses on using pipelined execution and parallel execution to improve throughput of pipelines. However, the resulting pipelines are standalone applications, i.e., software tools that are accessible only via local machine and that can only be run with the processing pipeline platforms. In this paper we propose an integration framework of the two approaches so that each offests the disadvantages of the other. We then present a case study wherein two representative frameworks, the Language Grid and UIMA, are integrated.
2013
pdf
bib
Interoperability between Service Composition and Processing Pipeline: Case Study on the Language Grid and UIMA
Trang Mai Xuan
|
Yohei Murakami
|
Donghui Lin
|
Toru Ishida
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
abs
Two Phase Evaluation for Selecting Machine Translation Services
Chunqi Shi
|
Donghui Lin
|
Masahiko Shimada
|
Toru Ishida
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
An increased number of machine translation services are now available. Unfortunately, none of them can provide adequate translation quality for all input sources. This forces the user to select from among the services according to his needs. However, it is tedious and time consuming to perform this manual selection. Our solution, proposed here, is an automatic mechanism that can select the most appropriate machine translation service. Although evaluation methods are available, such as BLEU, NIST, WER, etc., their evaluation results are not unanimous regardless of the translation sources. We proposed a two-phase architecture for selecting translation services. The first phase uses a data-driven classification to allow the most appropriate evaluation method to be selected according to each translation source. The second phase selects the most appropriate machine translation result by the selected evaluation method. We describe the architecture, detail the algorithm, and construct a prototype. Tests show that the proposal yields better translation quality than employing just one machine translation service.
pdf
bib
abs
Service Composition Scenarios for Task-Oriented Translation
Chunqi Shi
|
Donghui Lin
|
Toru Ishida
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Due to instant availability and low cost, machine translation is becoming popular. Machine translation mediated communication plays a more and more important role in international collaboration. However, machine translators cannot guarantee high quality translation. In a multilingual communication task, many in-domain resources, for example domain dictionaries, are needed to promote translation quality. This raises the problem of how to help communication task designers provide higher quality translation systems, systems that can take advantage of various in-domain resources. The Language Grid, a service-oriented collective intelligent platform, allows in-domain resources to be wrapped into language services. For task-oriented translation, we propose service composition scenarios for the composition of different language services, where various in-domain resources are utilized effectively. We design the architecture, provide a script language as the interface for the task designer, which is easy for describing the composition scenario, and make a case study of a Japanese-English campus orientation task. Based on the case study, we analyze the increase in translation quality possible and the usage of in-domain resources. The results demonstrate a clear improvement in translation accuracy when the in-domain resources are used.
2011
pdf
bib
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
Nicoletta Calzolari
|
Toru Ishida
|
Stelios Piperidis
|
Virach Sornlertlamvanich
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
pdf
bib
Federated Operation Model for the Language Grid
Toru Ishida
|
Yohei Murakami
|
Yoko Kubota
|
Rieko Inaba
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
pdf
bib
Open-Source Platform for Language Service Sharing
Yohei Murakami
|
Masahiro Tanaka
|
Donghui Lin
|
Toru Ishida
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
2010
pdf
bib
abs
Composing Human and Machine Translation Services: Language Grid for Improving Localization Processes
Donghui Lin
|
Yoshiaki Murakami
|
Toru Ishida
|
Yohei Murakami
|
Masahiro Tanaka
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
With the development of the Internet environments, more and more language services become accessible for common people. However, the gap between human translators and machine translators remains huge especially for the domain of localization processes that requires high translation quality. Although efforts of combining human and machine translators for supporting multilingual communication have been reported in previous research, how to apply such approaches for improving localization processes are rarely discussed. In this paper, we aim at improving localization processes by composing human and machine translation services based on the Language Grid, which is a language service platform that we have developed. Further, we conduct experiments to compare the translation quality and translation cost using several translation processes, including absolute machine translation processes, absolute human translation processes and translation processes by human and machine translation services. The experiment results show that composing monolingual roles and dictionary services improves the translation quality of machine translators, and that collaboration of human and machine translators is possible to reduce the cost comparing with the absolute bilingual human translation. We also discuss the generality of the experimental results and further challenging issues of the proposed localization processes.
pdf
bib
abs
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components
Arif Bramantoro
|
Ulrich Schäfer
|
Toru Ishida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Web services are increasingly being used in the natural language processing community as a way to increase the interoperability amongst language resources. This paper extends our previous work on integrating two different platforms, i.e. Heart of Gold and Language Grid. The Language Grid is an infrastructure built on top of the Internet to provide distributed language services. Heart of Gold is known as middleware architecture for integrating deep and shallow natural language processing components. The new feature of the integrated architecture is the combination of composite language services in the Language Grid and the multiple linguistic processing components in Heart of Gold to provide a better quality of language resources available on the Web. Thus, language resources with different characteristics can be combined based on the concept of service oriented computing with different treatment for each combination. Having Heart of Gold fully integrated in the Language Grid environment would contribute to the heterogeneity of language services.
pdf
bib
abs
Language Service Management with the Language Grid
Yohei Murakami
|
Donghui Lin
|
Masahiro Tanaka
|
Takao Nakaguchi
|
Toru Ishida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
As the number of language resources accessible on the Internet increases, many efforts have been made for combining language resources and language processing tools to create new services. However, existing language resource coordination frameworks cannot manage issues of intellectual property associated with language resources, which make it difficult for most end-users to get supports for their intercultural collaborations because they always have to deal with the issues by themselves. In this paper, we aim at constructing a new language service management architecture on the Language Grid, which enables language resource providers to control access to their resources in accordance with their own policies. Furthermore, we apply the proposed architecture to the operating Language Grid in order to validate the effectiveness of the architecture. As a result, several service management models utilizing the monitoring and access constraints are occurring to satisfy various requirements from language resource providers. These models can handle paid-for language resources as well as free language resources. Finally, we discuss further challenging issues of combining language resources under each different policies.
2006
pdf
bib
abs
Automatic Detection and Semi-Automatic Revision of Non-Machine-Translatable Parts of a Sentence
Kiyotaka Uchimoto
|
Naoko Hayashida
|
Toru Ishida
|
Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machine-translatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.
pdf
bib
abs
A Dictionary Model for Unifying Machine Readable Dictionaries and Computational Concept Lexicons
Yoshihiko Hayashi
|
Toru Ishida
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The Language Grid, recently proposed by one of the authors, is a language infrastructure available on the Internet. It aims to resolve the problems of accessibility and usability inherent in the currently available language services. The infrastructure will accommodate an operational environment in which a user and/or a software agent can develop a language service that is tailored to specific requirements derived from the various situations of intercultural communication. In order to effectively operate the infrastructure, each atomic language service has to be discovered by the planner of a composite service and incorporated into the composite service scenario. Meta-description of an atomic service is crucial to accomplish the planning process. This paper focuses on dictionary access services and proposes an abstract dictionary model that is vital for the accurate meta-description of such a service. In principle, the proposed model is based on the organization compatible with Princeton WordNet. Computational lexicons, including the EDR dictionary, as well as a range of human monolingual/bilingual dictionaries are uniformly organized into a WordNet-like lexical concept system. A modeling example with a few dictionary instances demonstrates the fundamental validity of the model.
2005
bib
Intercultural Collaboration using Machine Translation
Toru Ishida
Proceedings of Machine Translation Summit X: Invited papers
pdf
bib
abs
Automatic Rating of Machine Translatability
Kiyotaka Uchimoto
|
Naoko Hayashida
|
Toru Ishida
|
Hitoshi Isahara
Proceedings of Machine Translation Summit X: Papers
We describe a method for automatically rating the machine translatability of a sentence for various machine translation (MT) systems. The method requires that the MT system can bidirectionally translate sentences in both source and target languages. However, it does not require reference translations, as is usual for automatic MT evaluation. By applying this method to every component of a sentence in a given source language, we can automatically identify the machine-translatable and non-machinetranslatable parts of a sentence for a particular MT system. We show that the parts of a sentence that are automatically identified as nonmachine-translatable provide useful information for paraphrasing or revising the sentence in the source language, thus improving the quality of the final translation.