Cost-Effective Language Driven Image Editing with LX-DRIM

Rodrigo Santos, António Branco, João Ricardo Silva


Abstract
Cross-modal language and image processing is envisaged as a way to improve language understanding by resorting to visual grounding, but only recently, with the emergence of neural architectures specifically tailored to cope with both modalities, has it attracted increased attention and obtained promising results. In this paper we address a cross-modal task of language-driven image design, in particular the task of altering a given image on the basis of language instructions. We also avoid the need for a specifically tailored architecture and resort instead to a general purpose model in the Transformer family. Experiments with the resulting tool, LX-DRIM, show very encouraging results, confirming the viability of the approach for language-driven image design while keeping it affordable in terms of compute and data.
Anthology ID:
2022.mmmpie-1.5
Volume:
Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models
Month:
October
Year:
2022
Address:
Virtual
Venue:
MMMPIE
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
31–43
Language:
URL:
https://aclanthology.org/2022.mmmpie-1.5
DOI:
Bibkey:
Cite (ACL):
Rodrigo Santos, António Branco, and João Ricardo Silva. 2022. Cost-Effective Language Driven Image Editing with LX-DRIM. In Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models, pages 31–43, Virtual. International Conference on Computational Linguistics.
Cite (Informal):
Cost-Effective Language Driven Image Editing with LX-DRIM (Santos et al., MMMPIE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mmmpie-1.5.pdf
Code
 nlx-group/lx-drim