Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding Heming Xia author Zhe Yang author Qingxiu Dong author Peiyi Wang author Yongqi Li author Tao Ge author Tianyu Liu author Wenjie Li author Zhifang Sui author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication xia-etal-2024-unlocking 10.18653/v1/2024.findings-acl.456 https://aclanthology.org/2024.findings-acl.456/ 2024-08 7655 7671