Yulei Sui


2024

pdf bib
NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries
Wei Zhao | Zhitao Hou | Siyuan Wu | Yan Gao | Haoyu Dong | Yao Wan | Hongyu Zhang | Yulei Sui | Haidong Zhang
Findings of the Association for Computational Linguistics: EACL 2024

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation.

pdf bib
Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback
Zhangqian Bi | Yao Wan | Zheng Wang | Hongyu Zhang | Batu Guan | Fangxin Lu | Zili Zhang | Yulei Sui | Hai Jin | Xuanhua Shi
Findings of the Association for Computational Linguistics ACL 2024

Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways to allow the model to explore the project-level code context. We present CoCoGen, a new code generation approach that uses compiler feedback to improve the LLM-generated code. CoCoGen first leverages static analysis to identify mismatches between the generated code and the project’s context. It then iteratively aligns and fixes the identified errors using information extracted from the code repository. We integrate CoCoGen with two representative LLMs, i.e., GPT-3.5-Turbo and Code Llama (13B), and apply it to Python code generation. Experimental results show that CoCoGen significantly improves the vanilla LLMs by over 80% in generating code dependent on the project context and consistently outperforms the existing retrieval-based code generation baselines.

2023

pdf bib
Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment
Qianyu Feng | Yulei Sui | Hongyu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Despite significant advancements in text-to-image generation models, they still face challenges when it comes to producing highly detailed or complex images based on textual descriptions. In order to explore these limitations, we propose a Structured Semantic Alignment (SSA) method for evaluating text-to-image generation models. SSA focuses on learning structured semantic embeddings across different modalities and aligning them in a joint space. The method employs the following steps to achieve its objective: (i) Generating mutated prompts by substituting words with semantically equivalent or nonequivalent alternatives while preserving the original syntax; (ii) Representing the sentence structure through parsing trees obtained via syntax parsing; (iii) Learning fine-grained structured embeddings that project semantic features from different modalities into a shared embedding space; (iv) Evaluating the semantic consistency between the structured text embeddings and the corresponding visual embeddings. Through experiments conducted on various benchmarks, we have demonstrated that SSA offers improved measurement of semantic consistency of text-to-image generation models. Additionally, it unveils a wide range of generation errors including under-generation, incorrect constituency, incorrect dependency, and semantic confusion. By uncovering these biases and limitations embedded within the models, our proposed method provides valuable insights into their shortcomings when applied to real-world scenarios.

2021

pdf bib
Disentangled Code Representation Learning for Multiple Programming Languages
Jingfeng Zhang | Haiwen Hong | Yin Zhang | Yao Wan | Ye Liu | Yulei Sui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Fix-Filter-Fix: Intuitively Connect Any Models for Effective Bug Fixing
Haiwen Hong | Jingfeng Zhang | Yin Zhang | Yao Wan | Yulei Sui
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Locating and fixing bugs is a time-consuming task. Most neural machine translation (NMT) based approaches for automatically bug fixing lack generality and do not make full use of the rich information in the source code. In NMT-based bug fixing, we find some predicted code identical to the input buggy code (called unchanged fix) in NMT-based approaches due to high similarity between buggy and fixed code (e.g., the difference may only appear in one particular line). Obviously, unchanged fix is not the correct fix because it is the same as the buggy code that needs to be fixed. Based on these, we propose an intuitive yet effective general framework (called Fix-Filter-Fix or Fˆ3) for bug fixing. Fˆ3 connects models with our filter mechanism to filter out the last model’s unchanged fix to the next. We propose an Fˆ3 theory that can quantitatively and accurately calculate the Fˆ3 lifting effect. To evaluate, we implement the Seq2Seq Transformer (ST) and the AST2Seq Transformer (AT) to form some basic Fˆ3 instances, called Fˆ3_ST+AT and Fˆ3_AT+ST. Comparing them with single model approaches and many model connection baselines across four datasets validates the effectiveness and generality of Fˆ3 and corroborates our findings and methodology.