Shuai Yuan


2024

pdf bib
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Fei Yuan | Shuai Yuan | Zhiyong Wu | Lei Li
Findings of the Association for Computational Linguistics ACL 2024

Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM’s multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant .

pdf bib
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu | Zhiyong Wu | Qiushi Sun | Siyu Ren | Fei Yuan | Shuai Yuan | Qika Lin | Yu Qiao | Jun Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e.g., chemical molecular formula). Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mixture of natural and symbolic data. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we curated a data collection consisting of 34 tasks and incorporating 20 distinct symbolic families, intending to capture the interrelations and foster synergies between symbols. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.

2023

pdf bib
基于FLAT的农业病虫害命名实体识别(Named Entity Recognition of Agricultural Pests and Diseases based on FLAT)
Yi Ren (任义) | Jie Shen (沈洁) | Shuai Yuan (袁帅)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“针对传统命名实体识别方法中词嵌入无法表征一词多义及字词融合的模型存在特征提取不够准确的问题,本文提出了一种基于FLAT的交互式特征融合模型,该模型首先通过外部词典匹配获得字、词向量,经过BERT预训练后,通过设计的交互式特征融合模块充分挖掘字词间的依赖关系。另外,引入对抗训练提升模型的鲁棒性。其次,采用了特殊的相对位置编码将数据输入到自注意力机制,最后通过CRF得到全局最优序列。本文模型在农业病虫害数据集上识别的准确率、召回率、F1值分别达到了93.76%、92.14%和92.94%。”