Exploring Language Representation through a Resource Inventory Project

Carolyn Anderson


Abstract
The increasing scale of large language models has led some students to wonder what contributions can be made in academia. However, students are often unaware that LLM-based approaches are not feasible for the majority of the world’s languages due to lack of data availability. This paper presents a research project in which students explore the issue of language representation by creating an inventory of the data, preprocessing, and model resources available for a less-resourced language. Students are put into small groups and assigned a language to research. Within the group, students take on one of three roles: dataset investigator, preprocessing investigator, or downstream task investigator. Students then work together to create a 7-page research report about their language.
Anthology ID:
2024.teachingnlp-1.14
Volume:
Proceedings of the Sixth Workshop on Teaching NLP
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Sana Al-azzawi, Laura Biester, György Kovács, Ana Marasović, Leena Mathur, Margot Mieskes, Leonie Weissweiler
Venues:
TeachingNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–93
Language:
URL:
https://aclanthology.org/2024.teachingnlp-1.14
DOI:
Bibkey:
Cite (ACL):
Carolyn Anderson. 2024. Exploring Language Representation through a Resource Inventory Project. In Proceedings of the Sixth Workshop on Teaching NLP, pages 91–93, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Exploring Language Representation through a Resource Inventory Project (Anderson, TeachingNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.teachingnlp-1.14.pdf