ELQA: A Corpus of Metalinguistic Questions and Answers about English

Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes


Abstract
We present ELQA, a corpus of questions and answers in and about the English language. Collected from two online forums, the >70k questions (from English learners and others) cover wide-ranging topics including grammar, meaning, fluency, and etymology. The answers include descriptions of general properties of English vocabulary and grammar as well as explanations about specific (correct and incorrect) usage examples. Unlike most NLP datasets, this corpus is metalinguistic—it consists of language about language. As such, it can facilitate investigations of the metalinguistic capabilities of NLU models, as well as educational applications in the language learning domain. To study this, we define a free-form question answering task on our dataset and conduct evaluations on multiple LLMs (Large Language Models) to analyze their capacity to generate metalinguistic answers.
Anthology ID:
2023.acl-long.113
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2031–2047
Language:
URL:
https://aclanthology.org/2023.acl-long.113
DOI:
10.18653/v1/2023.acl-long.113
Bibkey:
Cite (ACL):
Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, and Amir Zeldes. 2023. ELQA: A Corpus of Metalinguistic Questions and Answers about English. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2031–2047, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ELQA: A Corpus of Metalinguistic Questions and Answers about English (Behzad et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.113.pdf
Video:
 https://aclanthology.org/2023.acl-long.113.mp4