ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility

Mateusz Lango, Patricia Schmidtova, Simone Balloccu, Ondrej Dusek


Abstract
In this paper, we describe several reproductions of a human evaluation experiment measuring the quality of automatic dialogue summarization (Feng et al., 2021). We investigate the impact of the annotators’ highest level of education, field of study, and native language on the evaluation of the informativeness of the summary. We find that the evaluation is relatively consistent regardless of these factors, but the biggest impact seems to be a prior specific background in natural language processing (as opposed to, e.g. a background in computer sci- ence). We also find that the experiment setup (asking for single vs. multiple criteria) may have an impact on the results.
Anthology ID:
2024.humeval-1.20
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
229–237
Language:
URL:
https://aclanthology.org/2024.humeval-1.20
DOI:
Bibkey:
Cite (ACL):
Mateusz Lango, Patricia Schmidtova, Simone Balloccu, and Ondrej Dusek. 2024. ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 229–237, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility (Lango et al., HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.20.pdf
Optional supplementary material:
 2024.humeval-1.20.OptionalSupplementaryMaterial.zip