Ming Dong


2025

pdf bib
On the Human-level Performance of Visual Question Answering
Chenlian Zhou | Guanyi Chen | Xin Bai | Ming Dong
Proceedings of the 31st International Conference on Computational Linguistics

Visual7W has been widely used in assessing multiple-choice visual question-answering (VQA) systems. This paper reports on a replicated human experiment on Visual7W with the aim of understanding the human-level performance of VQA. The replication was not entirely successful because human participants performed significantly worse when answering “where”, “when”, and “how” questions in compared to other question types. An error analysis discovered that the failure was a consequence of the non-deterministic distractors in Visual7W. GPT-4V was then evaluated using and was compared to the human-level performance. The results embody that, when evaluating models’ capacity on Visual7W, the performance is not necessarily the higher, the better.

pdf bib
Retrieval-Augmented Generation for Large Language Model based Few-shot Chinese Spell Checking
Ming Dong | Zhiwei Cheng | Changyin Luo | Tingting He
Proceedings of the 31st International Conference on Computational Linguistics

Large language models (LLMs) are naturally suitable for Chinese spelling check (CSC) task in few-shot scenarios due to their powerful semantic understanding and few-shot learning capabilities. Recent CSC research has begun to use LLMs as foundational models. However, most current datasets are primarily focused on errors generated during the text generation process, with little attention given to errors occurring in the modal conversion process. Furthermore, existing LLM-based CSC methods often rely on fixed prompt samples, which limits the performance of LLMs. Therefore, we propose a framework named RagID (Retrieval-Augment Generation and Iterative Discriminator Strategy). By utilizing semantic-based similarity search and an iterative discriminator mechanism, RagID can provide well-chosen prompt samples and reduce over-correction issues in LLM-based CSC. RagID demonstrates excellent effectiveness in few-shot scenarios. We conducted comprehensive experiments, and the results show that RagID achieves the best performance on dataset that include data from multiple domains and dataset containing modal conversion spelling errors. The dataset and method are available online.

2024

pdf bib
Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Ming Dong | Yujing Chen | Miao Zhang | Hao Sun | Tingting He
Findings of the Association for Computational Linguistics: ACL 2024

2022

pdf bib
DRLK: Dynamic Hierarchical Reasoning with Language Model and Knowledge Graph for Question Answering
Miao Zhang | Rufeng Dai | Ming Dong | Tingting He
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In recent years, Graph Neural Network (GNN) approaches with enhanced knowledge graphs (KG) perform well in question answering (QA) tasks. One critical challenge is how to effectively utilize interactions between the QA context and KG. However, existing work only adopts the identical QA context representation to interact with multiple layers of KG, which results in a restricted interaction. In this paper, we propose DRLK (Dynamic Hierarchical Reasoning with Language Model and Knowledge Graphs), a novel model that utilizes dynamic hierarchical interactions between the QA context and KG for reasoning. DRLK extracts dynamic hierarchical features in the QA context, and performs inter-layer and intra-layer interactions on each iteration, allowing the KG representation to be grounded with the hierarchical features of the QA context. We conduct extensive experiments on four benchmark datasets in medical QA and commonsense reasoning. The experimental results demonstrate that DRLK achieves state-of-the-art performances on two benchmark datasets and performs competitively on the others.