Data Augmentation for Fake News Detection by Combining Seq2seq and NLI

Anna Glazkova


Abstract
State-of-the-art data augmentation methods help improve the generalization of deep learning models. However, these methods often generate examples that contradict the preserving class labels. This is crucial for some natural language processing tasks, such as fake news detection. In this work, we combine sequence-to-sequence and natural language inference models for data augmentation in the fake news detection domain using short news texts, such as tweets and news titles. This approach allows us to generate new training examples that do not contradict facts from the original texts. We use the non-entailment probability for the pair of the original and generated texts as a loss function for a transformer-based sequence-to-sequence model. The proposed approach has demonstrated the effectiveness on three classification benchmarks in fake news detection in terms of the F1-score macro and ROC AUC. Moreover, we showed that our approach retains the class label of the original text more accurately than other transformer-based methods.
Anthology ID:
2023.ranlp-1.48
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
429–439
Language:
URL:
https://aclanthology.org/2023.ranlp-1.48
DOI:
Bibkey:
Cite (ACL):
Anna Glazkova. 2023. Data Augmentation for Fake News Detection by Combining Seq2seq and NLI. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 429–439, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Data Augmentation for Fake News Detection by Combining Seq2seq and NLI (Glazkova, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.48.pdf