Towards the Data-driven System for Rhetorical Parsing of Russian Texts

Elena Chistova, Maria Kobozeva, Dina Pisarevskaya, Artem Shelmanov, Ivan Smirnov, Svetlana Toldova


Abstract
Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.
Anthology ID:
W19-2711
Volume:
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:
June
Year:
2019
Address:
Minneapolis, MN
Editors:
Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–87
Language:
URL:
https://aclanthology.org/W19-2711
DOI:
10.18653/v1/W19-2711
Bibkey:
Cite (ACL):
Elena Chistova, Maria Kobozeva, Dina Pisarevskaya, Artem Shelmanov, Ivan Smirnov, and Svetlana Toldova. 2019. Towards the Data-driven System for Rhetorical Parsing of Russian Texts. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 82–87, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):
Towards the Data-driven System for Rhetorical Parsing of Russian Texts (Chistova et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2711.pdf
Poster:
 W19-2711.Poster.pdf