An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC)

Satwik Kottur, Paul Crook, Seungwhan Moon, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard


Abstract
There is a growing interest in virtual assistants with multimodal capabilities, e.g., inferring the context of a conversation through scene understanding. The recently released situated and interactive multimodal conversations (SIMMC) dataset addresses this trend by enabling research to create virtual assistants, which are capable of taking into account the scene that user sees when conversing with the user and also interacting with items in the scene. The SIMMC dataset is novel in that it contains fully annotated user-assistant, task-orientated dialogs where the user and an assistant co-observe the same visual elements and the latter can take actions to update the scene. The SIMMC challenge, held as part of theNinth Dialog System Technology Challenge(DSTC9), propelled the development of various models which together set a new state-of-the-art on the SIMMC dataset. In this work, we compare and analyze these models to identify‘what worked?’, and the remaining gaps;‘whatnext?’. Our analysis shows that even though pretrained language models adapted to this set-ting show great promise, there are indications that multimodal context isn’t fully utilised, and there is a need for better and scalable knowledge base integration. We hope this first-of-its-kind analysis for SIMMC models provides useful insights and opportunities for further research in multimodal conversational agents
Anthology ID:
2021.sigdial-1.15
Volume:
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
July
Year:
2021
Address:
Singapore and Online
Editors:
Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy Li
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
144–153
Language:
URL:
https://aclanthology.org/2021.sigdial-1.15
DOI:
10.18653/v1/2021.sigdial-1.15
Bibkey:
Cite (ACL):
Satwik Kottur, Paul Crook, Seungwhan Moon, Ahmad Beirami, Eunjoon Cho, Rajen Subba, and Alborz Geramifard. 2021. An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC). In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 144–153, Singapore and Online. Association for Computational Linguistics.
Cite (Informal):
An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC) (Kottur et al., SIGDIAL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigdial-1.15.pdf
Video:
 https://www.youtube.com/watch?v=VmdHZSno2MQ
Data
SIMMC