Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

Francesco De Toni, Christopher Akiki, Javier De La Rosa, Clémentine Fourrier, Enrique Manjavacas, Stefan Schweter, Daniel Van Strien


Abstract
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
Anthology ID:
2022.bigscience-1.7
Volume:
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
Month:
May
Year:
2022
Address:
virtual+Dublin
Editors:
Angela Fan, Suzana Ilic, Thomas Wolf, Matthias Gallé
Venue:
BigScience
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
75–83
Language:
URL:
https://aclanthology.org/2022.bigscience-1.7
DOI:
10.18653/v1/2022.bigscience-1.7
Bibkey:
Cite (ACL):
Francesco De Toni, Christopher Akiki, Javier De La Rosa, Clémentine Fourrier, Enrique Manjavacas, Stefan Schweter, and Daniel Van Strien. 2022. Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0. In Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models, pages 75–83, virtual+Dublin. Association for Computational Linguistics.
Cite (Informal):
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 (De Toni et al., BigScience 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bigscience-1.7.pdf
Video:
 https://aclanthology.org/2022.bigscience-1.7.mp4