Liwen Zhang


2023

pdf bib
Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization
Pengzhi Gao | Liwen Zhang | Zhongjun He | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: ACL 2023

The multilingual neural machine translation (NMT) model has a promising capability of zero-shot translation, where it could directly translate between language pairs unseen during training. For good transfer performance from supervised directions to zero-shot directions, the multilingual NMT model is expected to learn universal representations across different languages. This paper introduces a cross-lingual consistency regularization, CrossConST, to bridge the representation gap among different languages and boost zero-shot translation performance. The theoretical analysis shows that CrossConST implicitly maximizes the probability distribution for zero-shot translation, and the experimental results on both low-resource and high-resource benchmarks show that CrossConST consistently improves the translation performance. The experimental analysis also proves that CrossConST could close the sentence representation gap and better align the representation space. Given the universality and simplicity of CrossConST, we believe it can serve as a strong baseline for future multilingual NMT research.

pdf bib
Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Pengzhi Gao | Liwen Zhang | Zhongjun He | Hua Wu | Haifeng Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Multilingual sentence representations are the foundation for similarity-based bitext mining, which is crucial for scaling multilingual neural machine translation (NMT) system to more languages. In this paper, we introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports 223 languages. Leveraging billions of English-centric parallel corpora, we train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework with CrossConST, a cross-lingual consistency regularization technique proposed in Gao et al. (2023). Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach. Specifically, MuSR achieves superior performance over LASER3 (Heffernan et al., 2022) which consists of 148 independent multilingual sentence encoders.

2022

pdf bib
SHARP: Search-Based Adversarial Attack for Structured Prediction
Liwen Zhang | Zixia Jia | Wenjuan Han | Zilong Zheng | Kewei Tu
Findings of the Association for Computational Linguistics: NAACL 2022

Adversarial attack of structured prediction models faces various challenges such as the difficulty of perturbing discrete words, the sentence quality issue, and the sensitivity of outputs to small perturbations. In this work, we introduce SHARP, a new attack method that formulates the black-box adversarial attack as a search-based optimization problem with a specially designed objective function considering sentence fluency, meaning preservation and attacking effectiveness. Additionally, three different searching strategies are analyzed and compared, i.e., Beam Search, Metropolis-Hastings Sampling, and Hybrid Search. We demonstrate the effectiveness of our attacking strategies on two challenging structured prediction tasks: Pos-tagging and dependency parsing. Through automatic and human evaluations, we show that our method performs a more potent attack compared with pioneer arts. Moreover, the generated adversarial examples can be used to successfully boost the robustness and performance of the victim model via adversarial training.

2021

pdf bib
Adapting Unsupervised Syntactic Parsing Methodology for Discourse Dependency Parsing
Liwen Zhang | Ge Wang | Wenjuan Han | Kewei Tu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

One of the main bottlenecks in developing discourse dependency parsers is the lack of annotated training data. A potential solution is to utilize abundant unlabeled data by using unsupervised techniques, but there is so far little research in unsupervised discourse dependency parsing. Fortunately, unsupervised syntactic dependency parsing has been studied by decades, which could potentially be adapted for discourse parsing. In this paper, we propose a simple yet effective method to adapt unsupervised syntactic dependency parsing methodology for unsupervised discourse dependency parsing. We apply the method to adapt two state-of-the-art unsupervised syntactic dependency parsing methods. Experimental results demonstrate that our adaptation is effective. Moreover, we extend the adapted methods to the semi-supervised and supervised setting and surprisingly, we find that they outperform previous methods specially designed for supervised discourse parsing. Further analysis shows our adaptations result in superiority not only in parsing accuracy but also in time and space efficiency.

pdf bib
Generalized Supervised Attention for Text Generation
Yixian Liu | Liwen Zhang | Xinyu Zhang | Yong Jiang | Yue Zhang | Kewei Tu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Adversarial Attack and Defense of Structured Prediction Models
Wenjuan Han | Liwen Zhang | Yong Jiang | Kewei Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on classification problems. In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. Besides the difficulty of perturbing discrete words and the sentence fluency problem faced by attackers in any NLP tasks, there is a specific challenge to attackers of structured prediction models: the structured output of structured prediction models is sensitive to small perturbations in the input. To address these problems, we propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model with feedbacks from multiple reference models of the same structured prediction task. Based on the proposed attack, we further reinforce the victim model with adversarial training, making its prediction more robust and accurate. We evaluate the proposed framework in dependency parsing and part-of-speech tagging. Automatic and human evaluations show that our proposed framework succeeds in both attacking state-of-the-art structured prediction models and boosting them with adversarial training.

2019

pdf bib
Latent Variable Sentiment Grammar
Liwen Zhang | Kewei Tu | Yue Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural models have been investigated for sentiment classification over constituent trees. They learn phrase composition automatically by encoding tree structures but do not explicitly model sentiment composition, which requires to encode sentiment class labels. To this end, we investigate two formalisms with deep sentiment representations that capture sentiment subtype expressions by latent variables and Gaussian mixture vectors, respectively. Experiments on Stanford Sentiment Treebank (SST) show the effectiveness of sentiment grammar over vanilla neural encoders. Using ELMo embeddings, our method gives the best results on this benchmark.

2018

pdf bib
Gaussian Mixture Latent Vector Grammars
Yanpeng Zhao | Liwen Zhang | Kewei Tu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce Latent Vector Grammars (LVeGs), a new framework that extends latent variable grammars such that each nonterminal symbol is associated with a continuous vector space representing the set of (infinitely many) subtypes of the nonterminal. We show that previous models such as latent variable grammars and compositional vector grammars can be interpreted as special cases of LVeGs. We then present Gaussian Mixture LVeGs (GM-LVeGs), a new special case of LVeGs that uses Gaussian mixtures to formulate the weights of production rules over subtypes of nonterminals. A major advantage of using Gaussian mixtures is that the partition function and the expectations of subtype rules can be computed using an extension of the inside-outside algorithm, which enables efficient inference and learning. We apply GM-LVeGs to part-of-speech tagging and constituency parsing and show that GM-LVeGs can achieve competitive accuracies.