%0 Conference Proceedings %T Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding %A Xia, Heming %A Yang, Zhe %A Dong, Qingxiu %A Wang, Peiyi %A Li, Yongqi %A Ge, Tao %A Liu, Tianyu %A Li, Wenjie %A Sui, Zhifang %Y Ku, Lun-Wei %Y Martins, Andre %Y Srikumar, Vivek %S Findings of the Association for Computational Linguistics: ACL 2024 %D 2024 %8 August %I Association for Computational Linguistics %C Bangkok, Thailand %F xia-etal-2024-unlocking %R 10.18653/v1/2024.findings-acl.456 %U https://aclanthology.org/2024.findings-acl.456/ %U https://doi.org/10.18653/v1/2024.findings-acl.456 %P 7655-7671