Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega


Abstract
As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11× improvement in inference time and 87% reduction in storage requirements). It outperforms existing approaches by up to 4 AUROC points on four benchmarks. We also introduce an entropy-constrained version of our algorithm, leading to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings. Our source code is available on Github.
Anthology ID:
2024.findings-emnlp.758
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12943–12959
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.758/
DOI:
10.18653/v1/2024.findings-emnlp.758
Bibkey:
Cite (ACL):
Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, and Antonio Ortega. 2024. Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12943–12959, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression (Gulati et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.758.pdf