Unleashing Large Language Models’ Proficiency in Zero-shot Essay Scoring

Sanwoo Lee, Yida Cai, Desong Meng, Ziyang Wang, Yunfang Wu


Abstract
Advances in automated essay scoring (AES) have traditionally relied on labeled essays, requiring tremendous cost and expertise for their acquisition. Recently, large language models (LLMs) have achieved great success in various tasks, but their potential is less explored in AES. In this paper, we show that our zero-shot prompting framework, Multi Trait Specialization (MTS), elicits LLMs’ ample potential for essay scoring. In particular, we automatically decompose writing proficiency into distinct traits and generate scoring criteria for each trait. Then, an LLM is prompted to extract trait scores from several conversational rounds, each round scoring one of the traits based on the scoring criteria. Finally, we derive the overall score via trait averaging and min-max scaling. Experimental results on two benchmark datasets demonstrate that MTS consistently outperforms straightforward prompting (Vanilla) in average QWK across all LLMs and datasets, with maximum gains of 0.437 on TOEFL11 and 0.355 on ASAP. Additionally, with the help of MTS, the small-sized Llama2-13b-chat substantially outperforms ChatGPT, facilitating an effective deployment in real applications.
Anthology ID:
2024.findings-emnlp.10
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
181–198
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.10/
DOI:
10.18653/v1/2024.findings-emnlp.10
Bibkey:
Cite (ACL):
Sanwoo Lee, Yida Cai, Desong Meng, Ziyang Wang, and Yunfang Wu. 2024. Unleashing Large Language Models’ Proficiency in Zero-shot Essay Scoring. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 181–198, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Unleashing Large Language Models’ Proficiency in Zero-shot Essay Scoring (Lee et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.10.pdf