Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks Charlotte Siska author Katerina Marazopoulou author Melissa Ailem author James Bono author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication siska-etal-2024-examining 10.18653/v1/2024.acl-long.560 https://aclanthology.org/2024.acl-long.560/ 2024-08 10406 10421