ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems

Filip Klubička, John D. Kelleher


Abstract
This paper describes a reproduction of a human evaluation study evaluating redundancies generated in automatically generated text from a data-to-text system. While the scope of the original study is broader, a human evaluation—a manual error analysis—is included as part of the system evaluation. We attempt a reproduction of this human evaluation, however while the authors annotate multiple properties of the generated text, we focus exclusively on a single quality criterion, that of redundancy. In focusing our study on a single minimal reproducible experimental unit, with the experiment being fairly straightforward and all data made available by the authors, we encountered no challenges with our reproduction and were able to reproduce the trend found in the original experiment. However, while still confirming the general trend, we found that both our annotators identified twice as many errors in the dataset than the original authors.
Anthology ID:
2024.humeval-1.16
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
163–198
Language:
URL:
https://aclanthology.org/2024.humeval-1.16
DOI:
Bibkey:
Cite (ACL):
Filip Klubička and John D. Kelleher. 2024. ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 163–198, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems (Klubička & Kelleher, HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.16.pdf