Lessons Learned from a Citizen Science Project for Natural Language Processing

Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart De Castilho, Iryna Gurevych


Abstract
Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and at- tract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.
Anthology ID:
2023.eacl-main.261
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3594–3608
Language:
URL:
https://aclanthology.org/2023.eacl-main.261
DOI:
10.18653/v1/2023.eacl-main.261
Bibkey:
Cite (ACL):
Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart De Castilho, and Iryna Gurevych. 2023. Lessons Learned from a Citizen Science Project for Natural Language Processing. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3594–3608, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Lessons Learned from a Citizen Science Project for Natural Language Processing (Klie et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.261.pdf
Video:
 https://aclanthology.org/2023.eacl-main.261.mp4