Rishikesh Devanathan
2024
The Paradox of Preference: A Study on LLM Alignment Algorithms and Data Acquisition Methods
Rishikesh Devanathan
|
Varun Nathan
|
Ayush Kumar
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
This research investigates the impact of preference annotation acquisition methods on the performance of LLM alignment algorithms, including Direct Preference Optimization (DPO), Identity Preference Optimization (IPO), and Conservative DPO (cDPO), compared to Supervised Fine-Tuning (SFT) in NLP tasks. We analyze the influence of LLM and human-based preferences on algorithm performance, considering data volume and quality. Additionally, we assess DPO’s vulnerability to overfitting and IPO’s resilience against it, addressing four main research questions. Using the GAIR dataset and Zephyr-7b as the SFT model, we reveal unexpected negative outcomes. Specifically, DPO trained on LLM preferences outperforms human preferences, contrary to expectations. Moreover, there’s no correlation between preference data volume or quality and algorithm performance. Contrary to expectations, DPO shows no overfitting in both human and LLM preference datasets. Surprisingly, cDPO doesn’t fare better than DPO under flip noise. Our findings highlight the complexities of preference annotation methods and underscore the importance of scrutinizing negative results in NLP algorithm research.
2022
Team Innovators at SemEval-2022 for Task 8: Multi-Task Training with Hyperpartisan and Semantic Relation for Multi-Lingual News Article Similarity
Nidhir Bhavsar
|
Rishikesh Devanathan
|
Aakash Bhatnagar
|
Muskaan Singh
|
Petr Motlicek
|
Tirthankar Ghosal
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This work represents the system proposed by team Innovators for SemEval 2022 Task 8: Multilingual News Article Similarity. Similar multilingual news articles should match irrespective of the style of writing, the language of conveyance, and subjective decisions and biases induced by medium/outlet. The proposed architecture includes a machine translation system that translates multilingual news articles into English and presents a multitask learning model trained simultaneously on three distinct datasets. The system leverages the PageRank algorithm for Long-form text alignment. Multitask learning approach allows simultaneous training of multiple tasks while sharing the same encoder during training, facilitating knowledge transfer between tasks. Our best model is ranked 16 with a Pearson score of 0.733.
Search
Co-authors
- Nidhir Bhavsar 1
- Aakash Bhatnagar 1
- Muskaan Singh 1
- Petr Motlicek 1
- Tirthankar Ghosal 1
- show all...