CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification

Akbar Karimi, Lucie Flek


Abstract
Class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data augmentation by verb replacement for the identification of medical claims. In addition, we investigate the impact of this method and compare it with 3 other data augmentation techniques, showing that the proposed method can result in significant (relative) improvement on the minority class.
Anthology ID:
2023.semeval-1.292
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
2118–2123
Language:
URL:
https://aclanthology.org/2023.semeval-1.292
DOI:
10.18653/v1/2023.semeval-1.292
Bibkey:
Cite (ACL):
Akbar Karimi and Lucie Flek. 2023. CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 2118–2123, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification (Karimi & Flek, SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.292.pdf
Video:
 https://aclanthology.org/2023.semeval-1.292.mp4