Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

Richard Diehl Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery, Lisa Beinborn


Abstract
Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence, language models tend to not generalize well to tokens that are seldom seen during training. Moreover, maximum likelihood training has been discovered to give rise to anisotropy: representations of tokens in a model tend to cluster tightly in a high-dimensional cone, rather than spreading out over their representational capacity.Our work introduces a method for quantifying the frequency bias of a language model by assessing sentence-level perplexity with respect to token-level frequency. We then present a method for reducing the frequency bias of a language model by inducing a syntactic prior over token representations during pre-training. Our Syntactic Smoothing method adjusts the maximum likelihood objective function to distribute the learning signal to syntactically similar tokens. This approach results in better performance on infrequent English tokens and a decrease in anisotropy. We empirically show that the degree of anisotropy in a model correlates with its frequency bias.
Anthology ID:
2024.emnlp-main.344
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5999–6011
Language:
URL:
https://aclanthology.org/2024.emnlp-main.344
DOI:
10.18653/v1/2024.emnlp-main.344
Bibkey:
Cite (ACL):
Richard Diehl Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery, and Lisa Beinborn. 2024. Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5999–6011, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing (Diehl Martinez et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.344.pdf