Period Classification in Chinese Historical Texts

Zuoyu Tian, Sandra Kübler


Abstract
In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally “notebook” or “brush notes”), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.
Anthology ID:
2021.latechclfl-1.19
Volume:
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic (online)
Editors:
Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCHCLfL
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
168–177
Language:
URL:
https://aclanthology.org/2021.latechclfl-1.19
DOI:
10.18653/v1/2021.latechclfl-1.19
Bibkey:
Cite (ACL):
Zuoyu Tian and Sandra Kübler. 2021. Period Classification in Chinese Historical Texts. In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 168–177, Punta Cana, Dominican Republic (online). Association for Computational Linguistics.
Cite (Informal):
Period Classification in Chinese Historical Texts (Tian & Kübler, LaTeCHCLfL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.latechclfl-1.19.pdf