Investigating Paraphrase Generation as a Data Augmentation Strategy for Low-Resource AMR-to-Text Generation

Marco Antonio Sobrevilla Cabezudo, Marcio Lima Inacio, Thiago Alexandre Salgueiro Pardo


Abstract
Abstract Meaning Representation (AMR) is a meaning representation (MR) designed to abstract away from syntax, allowing syntactically different sentences to share the same AMR graph. Unlike other MRs, existing AMR corpora typically link one AMR graph to a single reference. This paper investigates the value of paraphrase generation in low-resource AMR-to-Text generation by testing various paraphrase generation strategies and evaluating their impact. The findings show that paraphrase generation significantly outperforms the baseline and traditional data augmentation methods, even with fewer training instances. Human evaluations indicate that this strategy often produces syntactic-based paraphrases and can exceed the performance of previous approaches. Additionally, the paper releases a paraphrase-extended version of the AMR corpus.
Anthology ID:
2024.inlg-main.51
Volume:
Proceedings of the 17th International Natural Language Generation Conference
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
663–675
Language:
URL:
https://aclanthology.org/2024.inlg-main.51
DOI:
Bibkey:
Cite (ACL):
Marco Antonio Sobrevilla Cabezudo, Marcio Lima Inacio, and Thiago Alexandre Salgueiro Pardo. 2024. Investigating Paraphrase Generation as a Data Augmentation Strategy for Low-Resource AMR-to-Text Generation. In Proceedings of the 17th International Natural Language Generation Conference, pages 663–675, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Investigating Paraphrase Generation as a Data Augmentation Strategy for Low-Resource AMR-to-Text Generation (Sobrevilla Cabezudo et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-main.51.pdf