GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez Beltrachini, Leonardo F . R. Ribeiro, Lewis Tunstall, Li Zhang, Mahim Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou
Abstract
Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.- Anthology ID:
- 2022.emnlp-demos.27
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Wanxiang Che, Ekaterina Shutova
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 266–281
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-demos.27
- DOI:
- 10.18653/v1/2022.emnlp-demos.27
- Bibkey:
- Cite (ACL):
- Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, et al.. 2022. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 266–281, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- GEMv2: Multilingual NLG Benchmarking in a Single Line of Code (Gehrmann et al., EMNLP 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.emnlp-demos.27.pdf
Export citation
@inproceedings{gehrmann-etal-2022-gemv2, title = "{GEM}v2: Multilingual {NLG} Benchmarking in a Single Line of Code", author = "Gehrmann, Sebastian and Bhattacharjee, Abhik and Mahendiran, Abinaya and Wang, Alex and Papangelis, Alexandros and Madaan, Aman and Mcmillan-major, Angelina and Shvets, Anna and Upadhyay, Ashish and Bohnet, Bernd and Yao, Bingsheng and Wilie, Bryan and Bhagavatula, Chandra and You, Chaobin and Thomson, Craig and Garbacea, Cristina and Wang, Dakuo and Deutsch, Daniel and Xiong, Deyi and Jin, Di and Gkatzia, Dimitra and Radev, Dragomir and Clark, Elizabeth and Durmus, Esin and Ladhak, Faisal and Ginter, Filip and Winata, Genta Indra and Strobelt, Hendrik and Hayashi, Hiroaki and Novikova, Jekaterina and Kanerva, Jenna and Chim, Jenny and Zhou, Jiawei and Clive, Jordan and Maynez, Joshua and Sedoc, Jo{\~a}o and Juraska, Juraj and Dhole, Kaustubh and Chandu, Khyathi Raghavi and Beltrachini, Laura Perez and Ribeiro, Leonardo F . R. and Tunstall, Lewis and Zhang, Li and Pushkarna, Mahim and Creutz, Mathias and White, Michael and Kale, Mihir Sanjay and Eddine, Moussa Kamal and Daheim, Nico and Subramani, Nishant and Dusek, Ondrej and Liang, Paul Pu and Ammanamanchi, Pawan Sasanka and Zhu, Qi and Puduppully, Ratish and Kriz, Reno and Shahriyar, Rifat and Cardenas, Ronald and Mahamood, Saad and Osei, Salomey and Cahyawijaya, Samuel and {\v{S}}tajner, Sanja and Montella, Sebastien and Jolly, Shailza and Mille, Simon and Hasan, Tahmid and Shen, Tianhao and Adewumi, Tosin and Raunak, Vikas and Raheja, Vipul and Nikolaev, Vitaly and Tsai, Vivian and Jernite, Yacine and Xu, Ying and Sang, Yisi and Liu, Yixin and Hou, Yufang", editor = "Che, Wanxiang and Shutova, Ekaterina", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = dec, year = "2022", address = "Abu Dhabi, UAE", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-demos.27", doi = "10.18653/v1/2022.emnlp-demos.27", pages = "266--281", abstract = "Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other{'}s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="gehrmann-etal-2022-gemv2"> <titleInfo> <title>GEMv2: Multilingual NLG Benchmarking in a Single Line of Code</title> </titleInfo> <name type="personal"> <namePart type="given">Sebastian</namePart> <namePart type="family">Gehrmann</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Abhik</namePart> <namePart type="family">Bhattacharjee</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Abinaya</namePart> <namePart type="family">Mahendiran</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alex</namePart> <namePart type="family">Wang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alexandros</namePart> <namePart type="family">Papangelis</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Aman</namePart> <namePart type="family">Madaan</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Angelina</namePart> <namePart type="family">Mcmillan-major</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anna</namePart> <namePart type="family">Shvets</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ashish</namePart> <namePart type="family">Upadhyay</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bernd</namePart> <namePart type="family">Bohnet</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bingsheng</namePart> <namePart type="family">Yao</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bryan</namePart> <namePart type="family">Wilie</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chandra</namePart> <namePart type="family">Bhagavatula</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chaobin</namePart> <namePart type="family">You</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Craig</namePart> <namePart type="family">Thomson</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Cristina</namePart> <namePart type="family">Garbacea</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dakuo</namePart> <namePart type="family">Wang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Daniel</namePart> <namePart type="family">Deutsch</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Deyi</namePart> <namePart type="family">Xiong</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Di</namePart> <namePart type="family">Jin</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dimitra</namePart> <namePart type="family">Gkatzia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dragomir</namePart> <namePart type="family">Radev</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Elizabeth</namePart> <namePart type="family">Clark</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Esin</namePart> <namePart type="family">Durmus</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Faisal</namePart> <namePart type="family">Ladhak</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Filip</namePart> <namePart type="family">Ginter</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Genta</namePart> <namePart type="given">Indra</namePart> <namePart type="family">Winata</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hendrik</namePart> <namePart type="family">Strobelt</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hiroaki</namePart> <namePart type="family">Hayashi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jekaterina</namePart> <namePart type="family">Novikova</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jenna</namePart> <namePart type="family">Kanerva</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jenny</namePart> <namePart type="family">Chim</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jiawei</namePart> <namePart type="family">Zhou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jordan</namePart> <namePart type="family">Clive</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joshua</namePart> <namePart type="family">Maynez</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">João</namePart> <namePart type="family">Sedoc</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Juraj</namePart> <namePart type="family">Juraska</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kaustubh</namePart> <namePart type="family">Dhole</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Khyathi</namePart> <namePart type="given">Raghavi</namePart> <namePart type="family">Chandu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Laura</namePart> <namePart type="given">Perez</namePart> <namePart type="family">Beltrachini</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Leonardo</namePart> <namePart type="given">F</namePart> <namePart type="given">.</namePart> <namePart type="given">R</namePart> <namePart type="family">Ribeiro</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lewis</namePart> <namePart type="family">Tunstall</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Li</namePart> <namePart type="family">Zhang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mahim</namePart> <namePart type="family">Pushkarna</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mathias</namePart> <namePart type="family">Creutz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Michael</namePart> <namePart type="family">White</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mihir</namePart> <namePart type="given">Sanjay</namePart> <namePart type="family">Kale</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Moussa</namePart> <namePart type="given">Kamal</namePart> <namePart type="family">Eddine</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nico</namePart> <namePart type="family">Daheim</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nishant</namePart> <namePart type="family">Subramani</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ondrej</namePart> <namePart type="family">Dusek</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Paul</namePart> <namePart type="given">Pu</namePart> <namePart type="family">Liang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pawan</namePart> <namePart type="given">Sasanka</namePart> <namePart type="family">Ammanamanchi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Qi</namePart> <namePart type="family">Zhu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ratish</namePart> <namePart type="family">Puduppully</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Reno</namePart> <namePart type="family">Kriz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rifat</namePart> <namePart type="family">Shahriyar</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ronald</namePart> <namePart type="family">Cardenas</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Saad</namePart> <namePart type="family">Mahamood</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Salomey</namePart> <namePart type="family">Osei</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Samuel</namePart> <namePart type="family">Cahyawijaya</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sanja</namePart> <namePart type="family">Štajner</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sebastien</namePart> <namePart type="family">Montella</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shailza</namePart> <namePart type="family">Jolly</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Simon</namePart> <namePart type="family">Mille</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tahmid</namePart> <namePart type="family">Hasan</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tianhao</namePart> <namePart type="family">Shen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tosin</namePart> <namePart type="family">Adewumi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vikas</namePart> <namePart type="family">Raunak</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vipul</namePart> <namePart type="family">Raheja</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vitaly</namePart> <namePart type="family">Nikolaev</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vivian</namePart> <namePart type="family">Tsai</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yacine</namePart> <namePart type="family">Jernite</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ying</namePart> <namePart type="family">Xu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yisi</namePart> <namePart type="family">Sang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yixin</namePart> <namePart type="family">Liu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yufang</namePart> <namePart type="family">Hou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-12</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title> </titleInfo> <name type="personal"> <namePart type="given">Wanxiang</namePart> <namePart type="family">Che</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ekaterina</namePart> <namePart type="family">Shutova</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Abu Dhabi, UAE</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.</abstract> <identifier type="citekey">gehrmann-etal-2022-gemv2</identifier> <identifier type="doi">10.18653/v1/2022.emnlp-demos.27</identifier> <location> <url>https://aclanthology.org/2022.emnlp-demos.27</url> </location> <part> <date>2022-12</date> <extent unit="page"> <start>266</start> <end>281</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T GEMv2: Multilingual NLG Benchmarking in a Single Line of Code %A Gehrmann, Sebastian %A Bhattacharjee, Abhik %A Mahendiran, Abinaya %A Wang, Alex %A Papangelis, Alexandros %A Madaan, Aman %A Mcmillan-major, Angelina %A Shvets, Anna %A Upadhyay, Ashish %A Bohnet, Bernd %A Yao, Bingsheng %A Wilie, Bryan %A Bhagavatula, Chandra %A You, Chaobin %A Thomson, Craig %A Garbacea, Cristina %A Wang, Dakuo %A Deutsch, Daniel %A Xiong, Deyi %A Jin, Di %A Gkatzia, Dimitra %A Radev, Dragomir %A Clark, Elizabeth %A Durmus, Esin %A Ladhak, Faisal %A Ginter, Filip %A Winata, Genta Indra %A Strobelt, Hendrik %A Hayashi, Hiroaki %A Novikova, Jekaterina %A Kanerva, Jenna %A Chim, Jenny %A Zhou, Jiawei %A Clive, Jordan %A Maynez, Joshua %A Sedoc, João %A Juraska, Juraj %A Dhole, Kaustubh %A Chandu, Khyathi Raghavi %A Beltrachini, Laura Perez %A Ribeiro, Leonardo F. .. R. %A Tunstall, Lewis %A Zhang, Li %A Pushkarna, Mahim %A Creutz, Mathias %A White, Michael %A Kale, Mihir Sanjay %A Eddine, Moussa Kamal %A Daheim, Nico %A Subramani, Nishant %A Dusek, Ondrej %A Liang, Paul Pu %A Ammanamanchi, Pawan Sasanka %A Zhu, Qi %A Puduppully, Ratish %A Kriz, Reno %A Shahriyar, Rifat %A Cardenas, Ronald %A Mahamood, Saad %A Osei, Salomey %A Cahyawijaya, Samuel %A Štajner, Sanja %A Montella, Sebastien %A Jolly, Shailza %A Mille, Simon %A Hasan, Tahmid %A Shen, Tianhao %A Adewumi, Tosin %A Raunak, Vikas %A Raheja, Vipul %A Nikolaev, Vitaly %A Tsai, Vivian %A Jernite, Yacine %A Xu, Ying %A Sang, Yisi %A Liu, Yixin %A Hou, Yufang %Y Che, Wanxiang %Y Shutova, Ekaterina %S Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, UAE %F gehrmann-etal-2022-gemv2 %X Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark. %R 10.18653/v1/2022.emnlp-demos.27 %U https://aclanthology.org/2022.emnlp-demos.27 %U https://doi.org/10.18653/v1/2022.emnlp-demos.27 %P 266-281
Markdown (Informal)
[GEMv2: Multilingual NLG Benchmarking in a Single Line of Code](https://aclanthology.org/2022.emnlp-demos.27) (Gehrmann et al., EMNLP 2022)
- GEMv2: Multilingual NLG Benchmarking in a Single Line of Code (Gehrmann et al., EMNLP 2022)
ACL
- Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, et al.. 2022. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 266–281, Abu Dhabi, UAE. Association for Computational Linguistics.