OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, Dacheng Tao


Abstract
Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favour of functional programming (FP), e.g., HumanEval and MBPP. To address this, our study introduces a pioneering OOP-focused benchmark, featuring 431 Python programs that encompass essential OOP concepts and features like classes and encapsulation methods. We propose a novel evaluation metric, pass@o, tailored for OOP, enhancing traditional pass@k metric. Our evaluation of 23 leading large language models (LLMs), including both general and code-specialized models, reveals three key insights: 1) pass@o offers a more relevant and comprehensive assessment for OOP code generation; 2) Despite excelling in FP, code-specialized LLMs like WizardCoder lag in OOP compared to models like ChatGPT; 3) The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field. Our benchmark and scripts will be publicly released at GitHub.
Anthology ID:
2024.findings-acl.808
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13619–13639
Language:
URL:
https://aclanthology.org/2024.findings-acl.808/
DOI:
10.18653/v1/2024.findings-acl.808
Bibkey:
Cite (ACL):
Shuai Wang, Liang Ding, Li Shen, Yong Luo, Bo Du, and Dacheng Tao. 2024. OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 13619–13639, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models (Wang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.808.pdf