2024
pdf
bib
abs
OSU CompLing at the GEM’24 Data-to-Text Task
Alyssa Allen
|
Ashley Lewis
|
Yi-Chien Lin
|
Tomiris Kaumenova
|
Michael White
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
This paper details experiments conducted for completing the GEM 2024 Data-to-Text task for a WebNLG dataset (Gardent et al., 2017). We show that model performance varies greatly across English, Spanish, Chinese, and Russian. Data filtering was done with automatic model judgments via error detection, which performs differently per language. We report English and Spanish dev set results for a data filtering and knowledge distillation approach to generating natural language outputs for sets of triples across a variety of domains. Specifically, we compare three generation conditions: 1) few-shot prompting with ChatGPT (GPT4), 2) fine-tuning LLama2 on the unfiltered dataset, and 3) fine-tuning Llama2 on a filtered version of the dataset. Russian and Chinese efforts did not result in submissions due to inconsistent or incoherent translations being produced in either the data synthesis or final generation stages. We provide details on these shortcomings but largely focus on Spanish and English efforts that align with our task submissions. We ultimately submitted outputs in English and Spanish that were generated using a version of Llama2 fine-tuned on a filtered dataset.
2021
pdf
bib
abs
Automatic Extraction of English Grammar Pattern Correction Rules
Kuan-Yu Shen
|
Yi-Chien Lin
|
Jason S. Chang
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
We introduce a method for generating error-correction rules for grammar pattern errors in a given annotated learner corpus. In our approach, annotated edits in the learner corpus are converted into edit rules for correcting common writing errors. The method involves automatic extraction of grammar patterns, and automatic alignment of the erroneous patterns and correct patterns. At run-time, grammar patterns are extracted from the grammatically correct sentences, and correction rules are retrieved by aligning the extracted grammar patterns with the erroneous patterns. Using the proposed method, we generate 1,499 high-quality correction rules related to 232 headwords. The method can be used to assist ESL students in avoiding grammatical errors, and aid teachers in correcting students’ essays. Additionally, the method can be used in the compilation of collocation error dictionaries and the construction of grammar error correction systems.
pdf
bib
abs
Learning to Find Translation of Grammar Patterns in Parallel Corpus
Kai-Wen Tuan
|
Yi-Jyun Chen
|
Yi-Chien Lin
|
Chun-Ho Kwok
|
Hai-Lun Tu
|
Jason S. Chang
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
We introduce a method for assisting English as Second Language (ESL) learners by providing translations of Collins COBUILD grammar patterns(GP) for a given word. In our approach, bilingual parallel corpus is transformed into bilingual GP pairs aimed at providing native language support for learning word usage through GPs. The method involves automatically parsing sentences to extract GPs, automatically generating translation GP pairs from bilingual sentences, and automatically extracting common bilingual GPs. At run-time, the target word is used for lookup GPs and translations, and the retrieved common GPs and their example sentences are shown to the user. We present a prototype phrase search engine, Linggle GPTrans, that implements the methods to assist ESL learners. Preliminary evaluation on a set of more than 300 GP-translation pairs shows that the methods achieve 91% accuracy.
2019
pdf
bib
標註英中同步樣式文法之研究(Annotating Synchronous Grammar Patterns across English and Chinese)
Ching-Yu Helen Yang
|
Ying-Zhu Chen
|
Jason S. Chang
|
Yi-Chien Lin
|
Wei-Tien Dylan Tsai
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)