Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset
Marcelo Viridiano, Arthur Lorenzi, Tiago Timponi Torrent, Ely E. Matos, Adriana S. Pagano, Natália Sathler Sigiliano, Maucha Gamonal, Helen de Andrade Abreu, Lívia Vicente Dutra, Mairon Samagaio, Mariane Carvalho, Franciany Campos, Gabrielly Azalim, Bruna Mazzei, Mateus Fonseca de Oliveira, Ana Carolina Luz, Livia Padua Ruiz, Júlia Bellei, Amanda Pestana, Josiane Costa, Iasmin Rabelo, Anna Beatriz Silva, Raquel Roza, Mariana Souza Mota, Igor Oliveira, Márcio Henrique Pelegrino de Freitas
Abstract
This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.- Anthology ID:
- 2024.lrec-main.656
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 7438–7449
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.656
- DOI:
- Bibkey:
- Cite (ACL):
- Marcelo Viridiano, Arthur Lorenzi, Tiago Timponi Torrent, Ely E. Matos, Adriana S. Pagano, Natália Sathler Sigiliano, Maucha Gamonal, Helen de Andrade Abreu, Lívia Vicente Dutra, Mairon Samagaio, Mariane Carvalho, Franciany Campos, Gabrielly Azalim, Bruna Mazzei, Mateus Fonseca de Oliveira, Ana Carolina Luz, Livia Padua Ruiz, Júlia Bellei, Amanda Pestana, et al.. 2024. Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7438–7449, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset (Viridiano et al., LREC-COLING 2024)
- Copy Citation:
- PDF:
- https://aclanthology.org/2024.lrec-main.656.pdf
Export citation
@inproceedings{viridiano-etal-2024-framed, title = "Framed {M}ulti30{K}: A Frame-Based Multimodal-Multilingual Dataset", author = "Viridiano, Marcelo and Lorenzi, Arthur and Timponi Torrent, Tiago and Matos, Ely E. and Pagano, Adriana S. and Sathler Sigiliano, Nat{\'a}lia and Gamonal, Maucha and de Andrade Abreu, Helen and Vicente Dutra, L{\'\i}via and Samagaio, Mairon and Carvalho, Mariane and Campos, Franciany and Azalim, Gabrielly and Mazzei, Bruna and Fonseca de Oliveira, Mateus and Luz, Ana Carolina and Padua Ruiz, Livia and Bellei, J{\'u}lia and Pestana, Amanda and Costa, Josiane and Rabelo, Iasmin and Silva, Anna Beatriz and Roza, Raquel and Souza Mota, Mariana and Oliveira, Igor and Pelegrino de Freitas, M{\'a}rcio Henrique", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.656", pages = "7438--7449", abstract = "This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="viridiano-etal-2024-framed"> <titleInfo> <title>Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset</title> </titleInfo> <name type="personal"> <namePart type="given">Marcelo</namePart> <namePart type="family">Viridiano</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Arthur</namePart> <namePart type="family">Lorenzi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tiago</namePart> <namePart type="family">Timponi Torrent</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ely</namePart> <namePart type="given">E</namePart> <namePart type="family">Matos</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Adriana</namePart> <namePart type="given">S</namePart> <namePart type="family">Pagano</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Natália</namePart> <namePart type="family">Sathler Sigiliano</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Maucha</namePart> <namePart type="family">Gamonal</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Helen</namePart> <namePart type="family">de Andrade Abreu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lívia</namePart> <namePart type="family">Vicente Dutra</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mairon</namePart> <namePart type="family">Samagaio</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mariane</namePart> <namePart type="family">Carvalho</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Franciany</namePart> <namePart type="family">Campos</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Gabrielly</namePart> <namePart type="family">Azalim</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bruna</namePart> <namePart type="family">Mazzei</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mateus</namePart> <namePart type="family">Fonseca de Oliveira</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ana</namePart> <namePart type="given">Carolina</namePart> <namePart type="family">Luz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Livia</namePart> <namePart type="family">Padua Ruiz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Júlia</namePart> <namePart type="family">Bellei</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Amanda</namePart> <namePart type="family">Pestana</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Josiane</namePart> <namePart type="family">Costa</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Iasmin</namePart> <namePart type="family">Rabelo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anna</namePart> <namePart type="given">Beatriz</namePart> <namePart type="family">Silva</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Raquel</namePart> <namePart type="family">Roza</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mariana</namePart> <namePart type="family">Souza Mota</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Igor</namePart> <namePart type="family">Oliveira</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Márcio</namePart> <namePart type="given">Henrique</namePart> <namePart type="family">Pelegrino de Freitas</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2024-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Min-Yen</namePart> <namePart type="family">Kan</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Veronique</namePart> <namePart type="family">Hoste</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alessandro</namePart> <namePart type="family">Lenci</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sakriani</namePart> <namePart type="family">Sakti</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nianwen</namePart> <namePart type="family">Xue</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>ELRA and ICCL</publisher> <place> <placeTerm type="text">Torino, Italia</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.</abstract> <identifier type="citekey">viridiano-etal-2024-framed</identifier> <location> <url>https://aclanthology.org/2024.lrec-main.656</url> </location> <part> <date>2024-05</date> <extent unit="page"> <start>7438</start> <end>7449</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset %A Viridiano, Marcelo %A Lorenzi, Arthur %A Timponi Torrent, Tiago %A Matos, Ely E. %A Pagano, Adriana S. %A Sathler Sigiliano, Natália %A Gamonal, Maucha %A de Andrade Abreu, Helen %A Vicente Dutra, Lívia %A Samagaio, Mairon %A Carvalho, Mariane %A Campos, Franciany %A Azalim, Gabrielly %A Mazzei, Bruna %A Fonseca de Oliveira, Mateus %A Luz, Ana Carolina %A Padua Ruiz, Livia %A Bellei, Júlia %A Pestana, Amanda %A Costa, Josiane %A Rabelo, Iasmin %A Silva, Anna Beatriz %A Roza, Raquel %A Souza Mota, Mariana %A Oliveira, Igor %A Pelegrino de Freitas, Márcio Henrique %Y Calzolari, Nicoletta %Y Kan, Min-Yen %Y Hoste, Veronique %Y Lenci, Alessandro %Y Sakti, Sakriani %Y Xue, Nianwen %S Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) %D 2024 %8 May %I ELRA and ICCL %C Torino, Italia %F viridiano-etal-2024-framed %X This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations. %U https://aclanthology.org/2024.lrec-main.656 %P 7438-7449
Markdown (Informal)
[Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset](https://aclanthology.org/2024.lrec-main.656) (Viridiano et al., LREC-COLING 2024)
- Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset (Viridiano et al., LREC-COLING 2024)
ACL
- Marcelo Viridiano, Arthur Lorenzi, Tiago Timponi Torrent, Ely E. Matos, Adriana S. Pagano, Natália Sathler Sigiliano, Maucha Gamonal, Helen de Andrade Abreu, Lívia Vicente Dutra, Mairon Samagaio, Mariane Carvalho, Franciany Campos, Gabrielly Azalim, Bruna Mazzei, Mateus Fonseca de Oliveira, Ana Carolina Luz, Livia Padua Ruiz, Júlia Bellei, Amanda Pestana, et al.. 2024. Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7438–7449, Torino, Italia. ELRA and ICCL.