Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks Abhinav Sukumar Rao author Atharva Roshan Naik author Sachin Vashistha author Somak Aditya author Monojit Choudhury author 2024-05 text Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Nicoletta Calzolari editor Min-Yen Kan editor Veronique Hoste editor Alessandro Lenci editor Sakriani Sakti editor Nianwen Xue editor ELRA and ICCL Torino, Italia conference publication rao-etal-2024-tricking https://aclanthology.org/2024.lrec-main.1462/ 2024-05 16802 16830