Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers Damai Dai author Yutao Sun author Li Dong author Yaru Hao author Shuming Ma author Zhifang Sui author Furu Wei author 2023-07 text Findings of the Association for Computational Linguistics: ACL 2023 Anna Rogers editor Jordan Boyd-Graber editor Naoaki Okazaki editor Association for Computational Linguistics Toronto, Canada conference publication dai-etal-2023-gpt 10.18653/v1/2023.findings-acl.247 https://aclanthology.org/2023.findings-acl.247/ 2023-07 4005 4019