PARSEME corpus release 1.3
Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, Jolanta Kovalevskaite, Timm Lichte, Nikola Ljubešić, Johanna Monti, Carla Parra Escartín, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze, Abigail Walsh
Abstract
We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.- Anthology ID:
- 2023.mwe-1.6
- Volume:
- Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24–35
- Language:
- URL:
- https://aclanthology.org/2023.mwe-1.6
- DOI:
- 10.18653/v1/2023.mwe-1.6
- Bibkey:
- Cite (ACL):
- Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, et al.. 2023. PARSEME corpus release 1.3. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 24–35, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- PARSEME corpus release 1.3 (Savary et al., MWE 2023)
- Copy Citation:
- PDF:
- https://aclanthology.org/2023.mwe-1.6.pdf
- Video:
- https://aclanthology.org/2023.mwe-1.6.mp4
Export citation
@inproceedings{savary-etal-2023-parseme, title = "{PARSEME} corpus release 1.3", author = {Savary, Agata and Ben Khelil, Cherifa and Ramisch, Carlos and Giouli, Voula and Barbu Mititelu, Verginica and Hadj Mohamed, Najet and Krstev, Cvetana and Liebeskind, Chaya and Xu, Hongzhi and Stymne, Sara and G{\"u}ng{\"o}r, Tunga and Pickard, Thomas and Guillaume, Bruno and Bej{\v{c}}ek, Eduard and Bhatia, Archna and Candito, Marie and Gantar, Polona and I{\~n}urrieta, Uxoa and Gatt, Albert and Kovalevskaite, Jolanta and Lichte, Timm and Ljube{\v{s}}i{\'c}, Nikola and Monti, Johanna and Parra Escart{\'\i}n, Carla and Shamsfard, Mehrnoush and Stoyanova, Ivelina and Vincze, Veronika and Walsh, Abigail}, editor = "Bhatia, Archna and Evang, Kilian and Garcia, Marcos and Giouli, Voula and Han, Lifeng and Taslimipoor, Shiva", booktitle = "Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.mwe-1.6", doi = "10.18653/v1/2023.mwe-1.6", pages = "24--35", abstract = "We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="savary-etal-2023-parseme"> <titleInfo> <title>PARSEME corpus release 1.3</title> </titleInfo> <name type="personal"> <namePart type="given">Agata</namePart> <namePart type="family">Savary</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Cherifa</namePart> <namePart type="family">Ben Khelil</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carlos</namePart> <namePart type="family">Ramisch</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Voula</namePart> <namePart type="family">Giouli</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Verginica</namePart> <namePart type="family">Barbu Mititelu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Najet</namePart> <namePart type="family">Hadj Mohamed</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Cvetana</namePart> <namePart type="family">Krstev</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chaya</namePart> <namePart type="family">Liebeskind</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hongzhi</namePart> <namePart type="family">Xu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Stymne</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tunga</namePart> <namePart type="family">Güngör</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Thomas</namePart> <namePart type="family">Pickard</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bruno</namePart> <namePart type="family">Guillaume</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Eduard</namePart> <namePart type="family">Bejček</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Archna</namePart> <namePart type="family">Bhatia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marie</namePart> <namePart type="family">Candito</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Polona</namePart> <namePart type="family">Gantar</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Uxoa</namePart> <namePart type="family">Iñurrieta</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Albert</namePart> <namePart type="family">Gatt</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jolanta</namePart> <namePart type="family">Kovalevskaite</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Timm</namePart> <namePart type="family">Lichte</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nikola</namePart> <namePart type="family">Ljubešić</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Johanna</namePart> <namePart type="family">Monti</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carla</namePart> <namePart type="family">Parra Escartín</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mehrnoush</namePart> <namePart type="family">Shamsfard</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ivelina</namePart> <namePart type="family">Stoyanova</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Veronika</namePart> <namePart type="family">Vincze</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Abigail</namePart> <namePart type="family">Walsh</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2023-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)</title> </titleInfo> <name type="personal"> <namePart type="given">Archna</namePart> <namePart type="family">Bhatia</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kilian</namePart> <namePart type="family">Evang</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marcos</namePart> <namePart type="family">Garcia</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Voula</namePart> <namePart type="family">Giouli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lifeng</namePart> <namePart type="family">Han</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shiva</namePart> <namePart type="family">Taslimipoor</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Dubrovnik, Croatia</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.</abstract> <identifier type="citekey">savary-etal-2023-parseme</identifier> <identifier type="doi">10.18653/v1/2023.mwe-1.6</identifier> <location> <url>https://aclanthology.org/2023.mwe-1.6</url> </location> <part> <date>2023-05</date> <extent unit="page"> <start>24</start> <end>35</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T PARSEME corpus release 1.3 %A Savary, Agata %A Ben Khelil, Cherifa %A Ramisch, Carlos %A Giouli, Voula %A Barbu Mititelu, Verginica %A Hadj Mohamed, Najet %A Krstev, Cvetana %A Liebeskind, Chaya %A Xu, Hongzhi %A Stymne, Sara %A Güngör, Tunga %A Pickard, Thomas %A Guillaume, Bruno %A Bejček, Eduard %A Bhatia, Archna %A Candito, Marie %A Gantar, Polona %A Iñurrieta, Uxoa %A Gatt, Albert %A Kovalevskaite, Jolanta %A Lichte, Timm %A Ljubešić, Nikola %A Monti, Johanna %A Parra Escartín, Carla %A Shamsfard, Mehrnoush %A Stoyanova, Ivelina %A Vincze, Veronika %A Walsh, Abigail %Y Bhatia, Archna %Y Evang, Kilian %Y Garcia, Marcos %Y Giouli, Voula %Y Han, Lifeng %Y Taslimipoor, Shiva %S Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023) %D 2023 %8 May %I Association for Computational Linguistics %C Dubrovnik, Croatia %F savary-etal-2023-parseme %X We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced. %R 10.18653/v1/2023.mwe-1.6 %U https://aclanthology.org/2023.mwe-1.6 %U https://doi.org/10.18653/v1/2023.mwe-1.6 %P 24-35
Markdown (Informal)
[PARSEME corpus release 1.3](https://aclanthology.org/2023.mwe-1.6) (Savary et al., MWE 2023)
- PARSEME corpus release 1.3 (Savary et al., MWE 2023)
ACL
- Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, et al.. 2023. PARSEME corpus release 1.3. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 24–35, Dubrovnik, Croatia. Association for Computational Linguistics.