Exploiting Image–Text Synergy for Contextual Image Captioning

Sreyasi Nag Chowdhury, Rajarshi Bhowmik, Hareesh Ravi, Gerard de Melo, Simon Razniewski, Gerhard Weikum


Abstract
Modern web content - news articles, blog posts, educational resources, marketing brochures - is predominantly multimodal. A notable trait is the inclusion of media such as images placed at meaningful locations within a textual narrative. Most often, such images are accompanied by captions - either factual or stylistic (humorous, metaphorical, etc.) - making the narrative more engaging to the reader. While standalone image captioning has been extensively studied, captioning an image based on external knowledge such as its surrounding text remains under-explored. In this paper, we study this new task: given an image and an associated unstructured knowledge snippet, the goal is to generate a contextual caption for the image.
Anthology ID:
2021.lantern-1.3
Volume:
Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Month:
April
Year:
2021
Address:
Kyiv, Ukraine
Editors:
Marius Mosbach, Michael A. Hedderich, Sandro Pezzelle, Aditya Mogadala, Dietrich Klakow, Marie-Francine Moens, Zeynep Akata
Venue:
LANTERN
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–37
Language:
URL:
https://aclanthology.org/2021.lantern-1.3
DOI:
Bibkey:
Cite (ACL):
Sreyasi Nag Chowdhury, Rajarshi Bhowmik, Hareesh Ravi, Gerard de Melo, Simon Razniewski, and Gerhard Weikum. 2021. Exploiting Image–Text Synergy for Contextual Image Captioning. In Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), pages 30–37, Kyiv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Exploiting Image–Text Synergy for Contextual Image Captioning (Nag Chowdhury et al., LANTERN 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.lantern-1.3.pdf