Distribution Aware Metrics for Conditional Natural Language Generation

David M. Chan, Yiming Ni, David Ross, Sudheendra Vijayanarasimhan, Austin Myers, John Canny


Abstract
Traditional automated metrics for evaluating conditional natural language generation rely on pairwise comparisons between a single generated text and the best-matching gold-standard reference. This method is effective when ground truth data diversity can be attributed to noise, however, it falls short when diversity in references holds valuable contextual information, as in visual description or summarization, as it does not evaluate the ability of a model to generate text matching the diversity of the ground truth samples. In this paper, we challenge the adequacy of existing metrics in such semantically diverse contexts and introduce a novel approach for evaluating conditional language generation models, leveraging a family of meta-metrics that build on existing pairwise distance functions. These meta-metrics assess not just single-samples, but distributions of reference and model-generated captions using small sample sets. We demonstrate our approach through a case study of visual description in the English language which reveals not only how current models prioritize single-description quality over diversity, but further sheds light on the impact of sampling methods and temperature settings on description quality and diversity.
Anthology ID:
2024.lrec-main.453
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
5064–5095
Language:
URL:
https://aclanthology.org/2024.lrec-main.453
DOI:
Bibkey:
Cite (ACL):
David M. Chan, Yiming Ni, David Ross, Sudheendra Vijayanarasimhan, Austin Myers, and John Canny. 2024. Distribution Aware Metrics for Conditional Natural Language Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5064–5095, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Distribution Aware Metrics for Conditional Natural Language Generation (Chan et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.453.pdf