Towards Objectively Evaluating the Quality of Generated Medical Summaries

Francesco Moramarco, Damir Juric, Aleksandar Savkov, Ehud Reiter


Abstract
We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
Anthology ID:
2021.humeval-1.6
Volume:
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Month:
April
Year:
2021
Address:
Online
Editors:
Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
Venue:
HumEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
56–61
Language:
URL:
https://aclanthology.org/2021.humeval-1.6
DOI:
Bibkey:
Cite (ACL):
Francesco Moramarco, Damir Juric, Aleksandar Savkov, and Ehud Reiter. 2021. Towards Objectively Evaluating the Quality of Generated Medical Summaries. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 56–61, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Objectively Evaluating the Quality of Generated Medical Summaries (Moramarco et al., HumEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.humeval-1.6.pdf