Dancheng Xin
2025
CDAˆ2: Counterfactual Diffusion Augmentation for Cross-Domain Adaptation in Low-Resource Sentiment Analysis
Dancheng Xin
|
Kaiqi Zhao
|
Jingyun Sun
|
Yang Li
Proceedings of the 31st International Conference on Computational Linguistics
Domain adaptation is widely employed in cross-domain sentiment analysis, enabling the transfer of models from label-rich source domains to target domain with fewer or no labels. However, concerns have been raised regarding their robustness and sensitivity to data distribution shift, particularly when encountering significant disparities in data distribution between the different domains. To tackle this problem, we introduce a framework CDAˆ2 for cross-domain adaptation in low-resource sentiment analysis, which utilizes counterfactual diffusion augmentation. Specifically, it employs samples derived from domain-relevant word substitutions in source domain samples to guide the diffusion model for generating high-quality counterfactual target domain samples. We adopt a soft absorbing state and MMD loss during the training stage, and use advanced ODE solvers to expedite the sampling process. Our experiments demonstrate that CDAˆ2 generates high-quality target samples and achieves state-of-the-art performance in cross-domain sentiment analysis.
2024
Diffusion Based Counterfactual Augmentation for Dual Sentiment Classification
Dancheng Xin
|
Jiawei Yuan
|
Yang Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
State-of-the-art NLP models have demonstrated exceptional performance across various tasks, including sentiment analysis. However, concerns have been raised about their robustness and susceptibility to systematic biases in both training and test data, which may lead to performance challenges when these models encounter out-of-distribution data in real-world applications. Although various data augmentation and adversarial perturbation techniques have shown promise in tackling these issues, prior methods such as word embedding perturbation or synonymous sentence expansion have failed to mitigate the spurious association problem inherent in the original data. Recent counterfactual augmentation methods have attempted to tackle this issue, but they have been limited by rigid rules, resulting in inconsistent context and disrupted semantics. In response to these challenges, we introduce a diffusion-based counterfactual data augmentation (DCA) framework. It utilizes an antonymous paradigm to guide the continuous diffusion model and employs reinforcement learning in combination with contrastive learning to optimize algorithms for generating counterfactual samples with high diversity and quality. Furthermore, we use a dual sentiment classifier to validate the generated antonymous samples and subsequently perform sentiment classification. Our experiments on four benchmark datasets demonstrate that DCA achieves state-of-the-art performance in sentiment classification tasks.