FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization

Mingye Zhu, Yi Liu, Quan Wang, Junbo Guo, Zhendong Mao


Anthology ID:
2024.emnlp-main.960
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17333–17350
Language:
URL:
https://aclanthology.org/2024.emnlp-main.960/
DOI:
10.18653/v1/2024.emnlp-main.960
Bibkey:
Cite (ACL):
Mingye Zhu, Yi Liu, Quan Wang, Junbo Guo, and Zhendong Mao. 2024. FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17333–17350, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization (Zhu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.960.pdf