Zhiyang Teng


2024

pdf bib
Exploring Self-supervised Logic-enhanced Training for Large Language Models
Fangkai Jiao | Zhiyang Teng | Bosheng Ding | Zhengyuan Liu | Nancy Chen | Shafiq Joty
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Traditional attempts to enhance the logical reasoning abilities of language models often rely on supervised fine-tuning, limiting their generalization to new tasks or domains. Large Language Models (LLMs), with their capacity to condense vast knowledge, can effectively tackle many tasks. Yet, our experiments reveal a gap in their performance on logical reasoning benchmarks when compared to state-of-the-art fine-tuning based models. To bridge this gap, we present LogicLLM, a first-of-its-kind, fully self-supervised framework for integrating logical reasoning capabilities into LLMs, and activating them via in-context learning. We apply this to two LLM series, FLAN-T5 and LLaMA, with parameter sizes from 3 billion to 33 billion. LogicLLM demonstrates its effectiveness through successful improvements on two logical reasoning benchmarks (ReClor and LogiQA-v2). Additionally, LogicLLM based on FLAN-T5-11B attains comparable results to ChatGPT, and evaluations with LLaMA-based models on three language understanding benchmarks (RACE, MMLU and Big-Bench-Hard) confirm that the improvements come without compromising the model’s general language understanding capabilities.

2023

pdf bib
LogiCoT: Logical Chain-of-Thought Instruction Tuning
Hanmeng Liu | Zhiyang Teng | Leyang Cui | Chaoli Zhang | Qiji Zhou | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

pdf bib
How Well Do Text Embedding Models Understand Syntax?
Yan Zhang | Zhaopeng Feng | Zhiyang Teng | Zuozhu Liu | Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2023

Text embedding models have significantly contributed to advancements in natural language processing by adeptly capturing semantic properties of textual data. However, the ability of these models to generalize across a wide range of syntactic contexts remains under-explored. In this paper, we first develop an evaluation set, named SR, to scrutinize the capability for syntax understanding of text embedding models from two crucial syntactic aspects: Structural heuristics, and Relational understanding among concepts, as revealed by the performance gaps in previous studies. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges, and such ineffectiveness becomes even more apparent when evaluated against existing benchmark datasets. Furthermore, we conduct rigorous analysis to unearth factors that lead to such limitations and examine why previous evaluations fail to detect such ineffectiveness. Lastly, we propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios. This study serves to highlight the hurdles associated with syntactic generalization and provides pragmatic guidance for boosting model performance across varied syntactic contexts.

pdf bib
Non-Autoregressive Document-Level Machine Translation
Guangsheng Bao | Zhiyang Teng | Hao Zhou | Jianhao Yan | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models in the context of sentence-level machine translation (MT). However, their abilities are unexplored in document-level MT, hindering their usage in real scenarios. In this paper, we conduct a comprehensive examination of typical NAT models in the context of document-level MT and further propose a simple but effective design of sentence alignment between source and target. Experiments show that NAT models achieve high acceleration on documents, and sentence alignment significantly enhances their performance. However, current NAT models still have a significant performance gap compared to their AT counterparts. Further investigation reveals that NAT models suffer more from the multi-modality and misalignment issues in the context of document-level MT, and current NAT models struggle with exploiting document context and handling discourse phenomena. We delve into these challenges and provide our code at https://github.com/baoguangsheng/nat-on-doc.

pdf bib
YATO: Yet Another deep learning based Text analysis Open toolkit
Zeqiang Wang | Yile Wang | Jiageng Wu | Zhiyang Teng | Jie Yang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We introduce YATO, an open-source, easy-to-use toolkit for text analysis with deep learning. Different from existing heavily engineered toolkits and platforms, YATO is lightweight and user-friendly for researchers from cross-disciplinary areas. Designed in a hierarchical structure, YATO supports free combinations of three types of widely used features including 1) traditional neural networks (CNN, RNN, etc.); 2) pre-trained language models (BERT, RoBERTa, ELECTRA, etc.); and 3) user-customized neural features via a simple configurable file. Benefiting from the advantages of flexibility and ease of use, YATO can facilitate fast reproduction and refinement of state-of-the-art NLP models, and promote the cross-disciplinary applications of NLP techniques. The code, examples, and documentation are publicly available at https://github.com/jiesutd/YATO. A demo video is also available at https://www.youtube.com/playlist?list=PLJ0mhzMcRuDUlTkzBfAftOqiJRxYTTjXH.

pdf bib
Target-Side Augmentation for Document-Level Machine Translation
Guangsheng Bao | Zhiyang Teng | Yue Zhang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document-level machine translation faces the challenge of data sparsity due to its long input length and a small amount of training data, increasing the risk of learning spurious patterns. To address this challenge, we propose a target-side augmentation method, introducing a data augmentation (DA) model to generate many potential translations for each source document. Learning on these wider range translations, an MT model can learn a smoothed distribution, thereby reducing the risk of data sparsity. We demonstrate that the DA model, which estimates the posterior distribution, largely improves the MT performance, outperforming the previous best system by 2.30 s-BLEU on News and achieving new state-of-the-art on News and Europarl benchmarks.

pdf bib
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
Xuming Hu | Zhijiang Guo | Zhiyang Teng | Irwin King | Philip S. Yu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models.

pdf bib
Token-level Fitting Issues of Seq2seq Models
Guangsheng Bao | Zhiyang Teng | Yue Zhang
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)

2022

pdf bib
Discrete Opinion Tree Induction for Aspect-based Sentiment Analysis
Chenhua Chen | Zhiyang Teng | Zhongqing Wang | Yue Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dependency trees have been intensively used with graph neural networks for aspect-based sentiment classification. Though being effective, such methods rely on external dependency parsers, which can be unavailable for low-resource languages or perform worse in low-resource domains. In addition, dependency trees are also not optimized for aspect-based sentiment classification. In this paper, we propose an aspect-specific and language-agnostic discrete latent opinion tree model as an alternative structure to explicit dependency trees. To ease the learning of complicated structured latent variables, we build a connection between aspect-to-context attention scores and syntactic distances, inducing trees from the attention scores. Results on six English benchmarks and one Chinese dataset show that our model can achieve competitive performance and interpretability.

2021

pdf bib
Solving Aspect Category Sentiment Analysis as a Text Generation Task
Jian Liu | Zhiyang Teng | Leyang Cui | Hanmeng Liu | Yue Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Aspect category sentiment analysis has attracted increasing research attention. The dominant methods make use of pre-trained language models by learning effective aspect category-specific representations, and adding specific output layers to its pre-trained representation. We consider a more direct way of making use of pre-trained language models, by casting the ACSA tasks into natural language generation tasks, using natural language sentences to represent the output. Our method allows more direct use of pre-trained knowledge in seq2seq language models by directly following the task setting during pre-training. Experiments on several benchmarks show that our method gives the best reported results, having large advantages in few-shot and zero-shot settings.

pdf bib
G-Transformer for Document-Level Machine Translation
Guangsheng Bao | Yue Zhang | Zhiyang Teng | Boxing Chen | Weihua Luo
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a solution, we propose G-Transformer, introducing locality assumption as an inductive bias into Transformer, reducing the hypothesis space of the attention from target to source. Experiments show that G-Transformer converges faster and more stably than Transformer, achieving new state-of-the-art BLEU scores for both nonpretraining and pre-training settings on three benchmark datasets.

2020

pdf bib
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation
Yan Zhang | Zhijiang Guo | Zhiyang Teng | Wei Lu | Shay B. Cohen | Zuozhu Liu | Lidong Bing
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information aggregation scheme. To account for these issues, larger and deeper GCN models are required to capture more complex interactions. In this paper, we introduce a dynamic fusion mechanism, proposing Lightweight Dynamic Graph Convolutional Networks (LDGCNs) that capture richer non-local interactions by synthesizing higher order information from the input graphs. We further develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity. With the help of these strategies, we are able to train a model with fewer parameters while maintaining the model capacity. Experiments demonstrate that LDGCNs outperform state-of-the-art models on two benchmark datasets for AMR-to-text generation with significantly fewer parameters.

pdf bib
Inducing Target-Specific Latent Structures for Aspect Sentiment Classification
Chenhua Chen | Zhiyang Teng | Yue Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect-level sentiment analysis aims to recognize the sentiment polarity of an aspect or a target in a comment. Recently, graph convolutional networks based on linguistic dependency trees have been studied for this task. However, the dependency parsing accuracy of commercial product comments or tweets might be unsatisfactory. To tackle this problem, we associate linguistic dependency trees with automatically induced aspectspecific graphs. We propose gating mechanisms to dynamically combine information from word dependency graphs and latent graphs which are learned by self-attention networks. Our model can complement supervised syntactic features with latent semantic dependencies. Experimental results on five benchmarks show the effectiveness of our proposed latent models, giving significantly better results than models without using latent graphs.

2019

pdf bib
Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning
Zhijiang Guo | Yan Zhang | Zhiyang Teng | Wei Lu
Transactions of the Association for Computational Linguistics, Volume 7

We focus on graph-to-sequence learning, which can be framed as transducing graph structures to sequences for text generation. To capture structural information associated with graphs, we investigate the problem of encoding graphs using graph convolutional networks (GCNs). Unlike various existing approaches where shallow architectures were used for capturing local structural information only, we introduce a dense connection strategy, proposing a novel Densely Connected Graph Convolutional Network (DCGCN). Such a deep architecture is able to integrate both local and non-local features to learn a better structural representation of a graph. Our model outperforms the state-of-the-art neural models significantly on AMR-to-text generation and syntax-based neural machine translation.

2018

pdf bib
Two Local Models for Neural Constituent Parsing
Zhiyang Teng | Yue Zhang
Proceedings of the 27th International Conference on Computational Linguistics

Non-local features have been exploited by syntactic parsers for capturing dependencies between sub output structures. Such features have been a key to the success of state-of-the-art statistical parsers. With the rise of deep learning, however, it has been shown that local output decisions can give highly competitive accuracies, thanks to the power of dense neural input representations that embody global syntactic information. We investigate two conceptually simple local neural models for constituent parsing, which make local decisions to constituent spans and CFG rules, respectively. Consistent with previous findings along the line, our best model gives highly competitive results, achieving the labeled bracketing F1 scores of 92.4% on PTB and 87.3% on CTB 5.1.

2017

pdf bib
Head-Lexicalized Bidirectional Tree LSTMs
Zhiyang Teng | Yue Zhang
Transactions of the Association for Computational Linguistics, Volume 5

Sequential LSTMs have been extended to model tree structures, giving competitive results for a number of tasks. Existing methods model constituent trees by bottom-up combinations of constituent nodes, making direct use of input word information only for leaf nodes. This is different from sequential LSTMs, which contain references to input words for each node. In this paper, we propose a method for automatic head-lexicalization for tree-structure LSTMs, propagating head words from leaf nodes to every constituent node. In addition, enabled by head lexicalization, we build a tree LSTM in the top-down direction, which corresponds to bidirectional sequential LSTMs in structure. Experiments show that both extensions give better representations of tree structures. Our final model gives the best results on the Stanford Sentiment Treebank and highly competitive results on the TREC question type classification task.

2016

pdf bib
Expectation-Regulated Neural Model for Event Mention Extraction
Ching-Yun Chang | Zhiyang Teng | Yue Zhang
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Exploiting Mutual Benefits between Syntax and Semantic Roles using Neural Network
Peng Shi | Zhiyang Teng | Yue Zhang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Context-Sensitive Lexicon Features for Neural Sentiment Analysis
Zhiyang Teng | Duy-Tin Vo | Yue Zhang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
LibN3L:A Lightweight Package for Neural NLP
Meishan Zhang | Jie Yang | Zhiyang Teng | Yue Zhang
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a light-weight machine learning tool for NLP research. The package supports operations on both discrete and dense vectors, facilitating implementation of linear models as well as neural models. It provides several basic layers which mainly aims for single-layer linear and non-linear transformations. By using these layers, we can conveniently implement linear models and simple neural models. Besides, this package also integrates several complex layers by composing those basic layers, such as RNN, Attention Pooling, LSTM and gated RNN. Those complex layers can be used to implement deep neural models directly.

pdf bib
Measuring the Information Content of Financial News
Ching-Yun Chang | Yue Zhang | Zhiyang Teng | Zahn Bozanic | Bin Ke
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Measuring the information content of news text is useful for decision makers in their investments since news information can influence the intrinsic values of companies. We propose a model to automatically measure the information content given news text, trained using news and corresponding cumulative abnormal returns of listed companies. Existing methods in finance literature exploit sentiment signal features, which are limited by not considering factors such as events. We address this issue by leveraging deep neural models to extract rich semantic features from news text. In particular, a novel tree-structured LSTM is used to find target-specific representations of news text given syntax structures. Empirical results show that the neural models can outperform sentiment-based models, demonstrating the effectiveness of recent NLP technology advances for computational finance.

2013

pdf bib
Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
Guosheng Ben | Deyi Xiong | Zhiyang Teng | Yajuan Lü | Qun Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)