Long Unit Word Tokenization and Bunsetsu Segmentation of Historical Japanese Hiroaki Ozaki author Kanako Komiya author Masayuki Asahara author Toshinobu Ogiso author 2024-08 text Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024) John Pavlopoulos editor Thea Sommerschield editor Yannis Assael editor Shai Gordin editor Kyunghyun Cho editor Marco Passarotti editor Rachele Sprugnoli editor Yudong Liu editor Bin Li editor Adam Anderson editor Association for Computational Linguistics Hybrid in Bangkok, Thailand and online conference publication ozaki-etal-2024-long 10.18653/v1/2024.ml4al-1.6 https://aclanthology.org/2024.ml4al-1.6/ 2024-08 48 55