2024
pdf
bib
abs
Multimodal Chart Retrieval: A Comparison of Text, Table and Image Based Approaches
Averi Nowak
|
Francesco Piccinno
|
Yasemin Altun
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We investigate multimodal chart retrieval, addressing the challenge of retrieving image-based charts using textual queries. We compare four approaches: (a) OCR with text retrieval, (b) chart derendering (DePlot) followed by table retrieval, (c) a direct image understanding model (PaLI-3), and (d) a combined PaLI-3 + DePlot approach. As the table retrieval component we introduce Tab-GTR, a text retrieval model augmented with table structure embeddings, achieving state-of-the-art results on the NQ-Tables benchmark with 48.88% R@1. On in-distribution data, the DePlot-based method (b) outperforms PaLI-3 (c), while being significantly more efficient (300M vs 3B trainable parameters). However, DePlot struggles with complex charts, indicating a need for improvements in chart derendering - specifically in terms of chart data diversity and the richness of text/table representations. We found no clear winner between methods (b) and (c) in general, with the best performance achieved by the combined approach (d), and further show that it benefits the most from multi-task training.
2023
pdf
bib
abs
DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu
|
Julian Eisenschlos
|
Francesco Piccinno
|
Syrine Krichene
|
Chenxi Pang
|
Kenton Lee
|
Mandar Joshi
|
Wenhu Chen
|
Nigel Collier
|
Yasemin Altun
Findings of the Association for Computational Linguistics: ACL 2023
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than thousands of data points, DePlot+LLM with just one-shot prompting achieves a 29.4% improvement over finetuned SOTA on human-written queries from the task of chart QA.
pdf
bib
abs
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Fangyu Liu
|
Francesco Piccinno
|
Syrine Krichene
|
Chenxi Pang
|
Kenton Lee
|
Mandar Joshi
|
Yasemin Altun
|
Nigel Collier
|
Julian Eisenschlos
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models’ capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
2022
pdf
bib
abs
LAD: Language Models as Data for Zero-Shot Dialog
Shikib Mehri
|
Yasemin Altun
|
Maxine Eskenazi
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
To facilitate zero-shot generalization in task-oriented dialog, this paper proposes Language Models as Data (LAD). LAD is a paradigm for creating diverse and accurate synthetic data which conveys the necessary structural constraints and can be used to train a downstream neural dialog model. LAD leverages GPT-3 to induce linguistic diversity. LAD achieves significant performance gains in zero-shot settings on intent prediction (+15%), slot filling (+31.4 F-1) and next action prediction (+10 F-1). Furthermore, an interactive human evaluation shows that training with LAD is competitive with training on human dialogs.
pdf
bib
abs
Table-To-Text generation and pre-training with TabT5
Ewa Andrejczuk
|
Julian Eisenschlos
|
Francesco Piccinno
|
Syrine Krichene
|
Yasemin Altun
Findings of the Association for Computational Linguistics: EMNLP 2022
Encoder-only transformer models have been successfully applied to different table understanding tasks, as in TAPAS. A major limitation of these architectures is that they are constrained to classification-like tasks such as cell selection or entailment detection. We present TabT5, an encoder-decoder model that generates natural language text based on tables and textual inputs. TabT5 overcomes the encoder-only limitation by incorporating a decoder component and leverages the input structure with table specific embeddings and pre-training. TabT5 achieves new state-of-the-art results on several domains, including spreadsheet formula prediction with a 15% increase in sequence accuracy, QA with a 2.5% increase in sequence accuracy and data-to-text generation with a 2.5% increase in BLEU.
2021
pdf
bib
abs
Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data
Massimo Nicosia
|
Zhongdi Qu
|
Yasemin Altun
Findings of the Association for Computational Linguistics: EMNLP 2021
While multilingual pretrained language models (LMs) fine-tuned on a single language have shown substantial cross-lingual task transfer capabilities, there is still a wide performance gap in semantic parsing tasks when target language supervision is available. In this paper, we propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parser. This method simplifies the popular Translate-Align-Project (TAP) pipeline and consists of a sequence-to-sequence filler model that constructs a full parse conditioned on an utterance and a view of the same parse. Our filler is trained on English data only but can accurately complete instances in other languages (i.e., translations of the English training utterances), in a zero-shot fashion. Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems which rely on traditional alignment techniques.
2019
pdf
bib
abs
Generating Logical Forms from Graph Representations of Text and Entities
Peter Shaw
|
Philip Massey
|
Angelica Chen
|
Francesco Piccinno
|
Yasemin Altun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Structured information about entities is critical for many semantic parsing tasks. We present an approach that uses a Graph Neural Network (GNN) architecture to incorporate information about relevant entities and their relations during parsing. Combined with a decoder copy mechanism, this approach provides a conceptually simple mechanism to generate logical forms with entities. We demonstrate that this approach is competitive with the state-of-the-art across several tasks without pre-training, and outperforms existing approaches when combined with BERT pre-training.
pdf
bib
abs
Answering Conversational Questions on Structured Data without Logical Forms
Thomas Mueller
|
Francesco Piccinno
|
Peter Shaw
|
Massimo Nicosia
|
Yasemin Altun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
We present a novel approach to answering sequential questions based on structured objects such as knowledge bases or tables without using a logical form as an intermediate representation. We encode tables as graphs using a graph neural network model based on the Transformer architecture. The answers are then selected from the encoded graph using a pointer network. This model is appropriate for processing conversations around structured data, where the attention mechanism that selects the answers to a question can also be used to resolve conversational references. We demonstrate the validity of this approach with competitive results on the Sequential Question Answering (SQA) task.
2013
pdf
bib
Overcoming the Lack of Parallel Data in Sentence Compression
Katja Filippova
|
Yasemin Altun
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
2007
pdf
bib
Semi-Markov Models for Sequence Segmentation
Qinfeng Shi
|
Yasemin Altun
|
Alex Smola
|
S.V.N. Vishwanathan
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
2006
pdf
bib
Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger
Massimiliano Ciaramita
|
Yasemin Altun
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
2004
pdf
bib
Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech
Michelle Gregory
|
Yasemin Altun
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)
2003
pdf
bib
Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences
Yasemin Altun
|
Mark Johnson
|
Thomas Hofmann
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing
2000
pdf
bib
Reading Comprehension Programs in a Statistical-Language-Processing Class
Eugene Charniak
|
Yasemin Altun
|
Rodrigo de Salvo Braz
|
Benjamin Garrett
|
Margaret Kosmala
|
Tomer Moscovich
|
Lixin Pang
|
Changhee Pyo
|
Ye Sun
|
Wei Wy
|
Zhongfa Yang
|
Shawn Zeiler
|
Lisa Zorn
ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems