Zekun Li


2024

pdf bib
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Zekun Li | Zhiyu Chen | Mike Ross | Patrick Huber | Seungwhan Moon | Zhaojiang Lin | Xin Dong | Adithya Sagar | Xifeng Yan | Paul Crook
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT’s performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at https://github.com/facebookresearch/FnCTOD.

2023

pdf bib
Limitations of Language Models in Arithmetic and Symbolic Induction
Jing Qian | Hong Wang | Zekun Li | Shiyang Li | Xifeng Yan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent work has shown that large pretrained Language Models (LMs) can not only perform remarkably well on a range of Natural Language Processing (NLP) tasks but also start improving on reasoning tasks such as arithmetic induction, symbolic manipulation, and commonsense reasoning with increasing size of models. However, it is still unclear what the underlying capabilities of these LMs are. Surprisingly, we find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition. When the total number of symbols or repeating symbols increases, the model performance drops quickly. We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs. Experimental results show that none of these techniques can solve the simplest addition induction problem completely. In the end, we introduce LMs with tutor, which demonstrates every single step of teaching. LMs with tutor is able to deliver 100% accuracy in situations of OOD and repeating symbols, shedding new insights on the boundary of large LMs in induction.

pdf bib
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding
Zekun Li | Wenxuan Zhou | Yao-Yi Chiang | Muhao Chen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.

pdf bib
ChatEdit: Towards Multi-turn Interactive Facial Image Editing via Dialogue
Xing Cui | Zekun Li | Pei Li | Yibo Hu | Hailin Shi | Chunshui Cao | Zhaofeng He
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This paper explores interactive facial image editing through dialogue and presents the ChatEdit benchmark dataset for evaluating image editing and conversation abilities in this context. ChatEdit is constructed from the CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding to user editing requests on the images. The dataset is challenging, as it requires the system to dynamically track and edit images based on user requests, while generating appropriate natural language responses. To address these challenges, we propose a framework comprising a dialogue module for tracking user requests as well as generating responses, and an image editing module for editing images accordingly. Unlike previous approaches, our framework directly tracks the user request of the current turn from the entire dialogue history and edits the initial image instead of manipulating the output from the previous turn, mitigating error accumulation and attribute forgetting issues. Extensive experiments on the ChatEdit dataset demonstrate the superiority of our framework over previous methods and also improvement rooms, encouraging future research. We will release the code and data publicly to facilitate advancements in complex interactive facial image editing.

2022

pdf bib
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation
Zekun Li | Jina Kim | Yao-Yi Chiang | Muhao Chen
Findings of the Association for Computational Linguistics: EMNLP 2022

Named geographic entities (geo-entities for short) are the building blocks of many geographic datasets. Characterizing geo-entities is integral to various application domains, such as geo-intelligence and map comprehension, while a key challenge is to capture the spatial-varying context of an entity. We hypothesize that we shall know the characteristics of a geo-entity by its surrounding entities, similar to knowing word meanings by their linguistic context. Accordingly, we propose a novel spatial language model, SpaBERT, which provides a general-purpose geo-entity representation based on neighboring entities in geospatial data. SpaBERT extends BERT to capture linearized spatial context, while incorporating a spatial coordinate embedding mechanism to preserve spatial relations of entities in the 2-dimensional space. SpaBERT is pretrained with masked language modeling and masked entity prediction tasks to learn spatial dependencies. We apply SpaBERT to two downstream tasks: geo-entity typing and geo-entity linking. Compared with the existing language models that do not use spatial context, SpaBERT shows significant performance improvement on both tasks. We also analyze the entity representation from SpaBERT in various settings and the effect of spatial coordinate embedding.

pdf bib
Controllable Dialogue Simulation with In-context Learning
Zekun Li | Wenhu Chen | Shiyang Li | Hong Wang | Jing Qian | Xifeng Yan
Findings of the Association for Computational Linguistics: EMNLP 2022

Building dialogue systems requires a large corpus of annotated dialogues. Such datasets are usually created via crowdsourcing, which is expensive and time-consuming. In this paper, we propose Dialogic, a novel dialogue simulation method based on large language model in-context learning to automate dataset creation. Seeded with a few annotated dialogues, Dialogic automatically selects in-context examples for demonstration and prompts GPT-3 to generate new dialogues and annotations in a controllable way. Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement and parameter update and is thus much more cost-efficient and time-saving than crowdsourcing. Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed. When the full training set is given, our method can still serve as an effective data augmentation method to further improve performance. Human evaluation results also show that our simulated dialogues have near-human fluency and annotation accuracy. The code and data are available at https://github.com/Leezekun/dialogic.