%0 Conference Proceedings %T Reflect-RL: Two-Player Online RL Fine-Tuning for LMs %A Zhou, Runlong %A Du, Simon %A Li, Beibin %Y Ku, Lun-Wei %Y Martins, Andre %Y Srikumar, Vivek %S Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) %D 2024 %8 August %I Association for Computational Linguistics %C Bangkok, Thailand %F zhou-etal-2024-reflect %R 10.18653/v1/2024.acl-long.56 %U https://aclanthology.org/2024.acl-long.56/ %U https://doi.org/10.18653/v1/2024.acl-long.56 %P 995-1015