Transformer Dissection: An Unified Understanding for Transformer‘s Attention via the Lens of Kernel Yao-Hung Hubert Tsai author Shaojie Bai author Makoto Yamada author Louis-Philippe Morency author Ruslan Salakhutdinov author 2019-11 text Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Kentaro Inui editor Jing Jiang editor Vincent Ng editor Xiaojun Wan editor Association for Computational Linguistics Hong Kong, China conference publication tsai-etal-2019-transformer 10.18653/v1/D19-1443 https://aclanthology.org/D19-1443/ 2019-11 4344 4353