Text Fluoroscopy: Detecting LLM-Generated Text through Intrinsic Features

Xiao Yu, Kejiang Chen, Qi Yang, Weiming Zhang, Nenghai Yu


Abstract
Large language models (LLMs) have revolutionized the domain of natural language processing because of their excellent performance on various tasks. Despite their impressive capabilities, LLMs also have the potential to generate texts that pose risks of misuse. Consequently, detecting LLM-generated text has become increasingly important.Previous LLM-generated text detection methods use semantic features, which are stored in the last layer. This leads to methods that overfit the training set domain and exhibit shortcomings in generalization. Therefore, We argue that utilizing intrinsic features rather than semantic features for detection results in better performance.In this work, we design Text Fluoroscopy, a black-box method with better generalizability for detecting LLM-generated text by mining the intrinsic features of the text to be detected. Our method captures the text’s intrinsic features by identifying the layer with the largest distribution difference from the last and first layers when projected to the vocabulary space.Our method achieves 7.36% and 2.84% average improvement in detection performance compared to the baselines in detecting texts from different domains generated by GPT-4 and Claude3, respectively.
Anthology ID:
2024.emnlp-main.885
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15838–15846
Language:
URL:
https://aclanthology.org/2024.emnlp-main.885/
DOI:
10.18653/v1/2024.emnlp-main.885
Bibkey:
Cite (ACL):
Xiao Yu, Kejiang Chen, Qi Yang, Weiming Zhang, and Nenghai Yu. 2024. Text Fluoroscopy: Detecting LLM-Generated Text through Intrinsic Features. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15838–15846, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Text Fluoroscopy: Detecting LLM-Generated Text through Intrinsic Features (Yu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.885.pdf
Software:
 2024.emnlp-main.885.software.zip
Data:
 2024.emnlp-main.885.data.zip