EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing

Ana-Maria Bucur, Madalina Chitez, Valentina Muresan, Andreea Dinca, Roxana Rogobete


Abstract
Field Specific Expert Scientific Writing in English as a Lingua Franca is essential for the effective research networking and dissemination worldwide. Extracting the linguistic profile of the research articles written in L2 English can help young researchers and expert scholars in various disciplines adapt to the scientific writing norms of their communities of practice. In this exploratory study, we present and test an automated linguistic assessment model that includes features relevant for the cross-disciplinary second language framework: Text Complexity Analysis features, such as Syntactic and Lexical Complexity, and Field Specific Academic Word Lists. We analyse how these features vary across four disciplinary fields (Economics, IT, Linguistics and Political Science) in a corpus of L2-English Expert Scientific Writing, part of the EXPRES corpus (Corpus of Expert Writing in Romanian and English). The variation in field specific writing is also analysed in groups of linguistic features extracted from the higher visibility (Hv) versus lower visibility (Lv) journals. After applying lexical sophistication, lexical variation and syntactic complexity formulae, significant differences between disciplines were identified, mainly that research articles from Lv journals have higher lexical complexity, but lower syntactic complexity than articles from Hv journals; while academic vocabulary proved to have discipline specific variation.
Anthology ID:
2022.lrec-1.507
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4739–4746
Language:
URL:
https://aclanthology.org/2022.lrec-1.507
DOI:
Bibkey:
Cite (ACL):
Ana-Maria Bucur, Madalina Chitez, Valentina Muresan, Andreea Dinca, and Roxana Rogobete. 2022. EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4739–4746, Marseille, France. European Language Resources Association.
Cite (Informal):
EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing (Bucur et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.507.pdf