Contribution of Move Structure to Automatic Genre Identification: An Annotated Corpus of French Tourism Websites

Rémi Cardon, Trang Tran Hanh Pham, Julien Zakhia Doueihi, Thomas François


Abstract
The present work studies the contribution of move structure to automatic genre identification. This concept - well known in other branches of genre analysis - seems to have little application in natural language processing. We describe how we collect a corpus of websites in French related to tourism and annotate it with move structure. We conduct experiments on automatic genre identification with our corpus. Our results show that our approach for informing a model with move structure can increase its performance for automatic genre identification, and reduce the need for annotated data and computational power.
Anthology ID:
2024.lrec-main.347
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3916–3926
Language:
URL:
https://aclanthology.org/2024.lrec-main.347
DOI:
Bibkey:
Cite (ACL):
Rémi Cardon, Trang Tran Hanh Pham, Julien Zakhia Doueihi, and Thomas François. 2024. Contribution of Move Structure to Automatic Genre Identification: An Annotated Corpus of French Tourism Websites. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3916–3926, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Contribution of Move Structure to Automatic Genre Identification: An Annotated Corpus of French Tourism Websites (Cardon et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.347.pdf