On the interaction of automatic evaluation and task framing in headline style transfer

Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell’Orletta, Malvina Nissim, Albert Gatt


Abstract
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU.
Anthology ID:
2020.evalnlgeval-1.5
Volume:
Proceedings of the 1st Workshop on Evaluating NLG Evaluation
Month:
December
Year:
2020
Address:
Online (Dublin, Ireland)
Editors:
Shubham Agarwal, Ondřej Dušek, Sebastian Gehrmann, Dimitra Gkatzia, Ioannis Konstas, Emiel Van Miltenburg, Sashank Santhanam
Venue:
EvalNLGEval
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–43
Language:
URL:
https://aclanthology.org/2020.evalnlgeval-1.5
DOI:
Bibkey:
Cite (ACL):
Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell’Orletta, Malvina Nissim, and Albert Gatt. 2020. On the interaction of automatic evaluation and task framing in headline style transfer. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation, pages 38–43, Online (Dublin, Ireland). Association for Computational Linguistics.
Cite (Informal):
On the interaction of automatic evaluation and task framing in headline style transfer (Mattei et al., EvalNLGEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.evalnlgeval-1.5.pdf
Code
 michelecafagna26/CHANGE-IT