pdf
bib
abs
Construction of CFSP Model Based on Non-Finetuning Large Language Model
Huang Fugeng
|
Guo Zhongbin
|
Li Wenting
|
Cheng Haibo
“Chinese Frame Semantic Parsing (CFSP) is an important task in the field of Chinese Natural Language Processing(NLP). Its goal is to extract the frame semantic structure from the sentence and realize the deep understanding of the events or situations involved in the sentence. This paper mainly studies the application of Large Language Model (LLM) for reasoning through Prompt Engineering without fine-tuning the model, and completes three subtasks of Chinese Framework Semantic Parsing tasks: frame identification, argument Identification and role identification. This paper proposes a Retrieval Augmented Generation (RAG) method for target words, and constructs more refined sample Few-Shot method. We achieved the second place on the B rankings in the open track in the “CCL2024-Eval The Second Chinese Frame Semantic Parsing”competition*.”
pdf
bib
abs
Application of Entity Classification Model Based on Different Position Embedding in Chinese Frame Semantic Parsing
Zhou Huirong
|
Tian Sujie
|
Li Junbo
|
Yuan Xiao
“This paper addresses three subtasks of Chinese Frame Semantic Parsing based on the BERT and RoBERTa pre-trained models: Frame Identification, Argument Identification, and Role Identification. In the Frame Identification task, we utilize the BERT PLM with Rotary Positional Encoding for the semantic frame classification task. For the Argument Identification task, we employ the RoBERTa PLM with T5 position encoding for extraction tasks. In the Role Identification task, we use the RoBERTa PLM with ALiBi position encoding for the classification task. Ultimately, our approach achieved a score of 71.41 in the closed track of the B leaderboard, securing fourth place and validating the effectiveness of our method.”
pdf
bib
abs
Leveraging LLMs for Chinese Frame Semantic Parsing
Liu Yahui
|
Gong Chen
|
Zhang Min
“We participate in the open track of the Chinese frame semantic parsing (CFSP) task, i.e., CCL24Eval Task 1, and our submission ranks first. FSP is an important task in Natural Language Processing, aiming to extract the frame semantic structures from sentences, which can be divided into three subtasks, e.g., Frame Identification (FI), Argument Identification (AI), and Role Identification (RI). In this paper, we use the LLM Gemini 1.0 to evaluate the three subtasks of CFSP, and present the techniques and strategies we employed to enhance subtasks performance. For FI, we leverage mapping and similarity strategies to minimize the candidate frames for each target word, which can reduce the complexity of the LLM in identifying the appropriate frame. For AI and RI subtasks, we utilize the results from small models as auxiliary information and apply data augmentation, self-training, and model ensemble techniques on these small models to further enhance the performance of subtasks.”
pdf
bib
abs
Chinese Frame Semantic Parsing Evaluation
Yang Peiyuan
|
Li Juncai
|
Yan Zhichao
|
Su Xuefeng
|
Ru Li
“Chinese Frame-semantic Parsing (CFSP) aims to extract fine-grained frame-semantic structures from texts, which can provide fine-grained semantic information for natural language understanding models to enhance their abilities of semantic representations. Based on the CCL-23 CFSP evaluation task, we introduce construction grammar to expand the targets, as basic units activating frames in texts, from word-style to construction-style, and publish a more challenging CFSP evaluation task in CCL-2024. The evaluation dataset consists of 22,000 annotated examples involving nearly 695 frames. The evaluation task is divided into three subtasks: frame identification, argument identification, and role identification, involving two tracks: close track and open track. The evaluation task has attracted wide attention from both industry and academia, with a total of 1988 participating teams. As for the evaluation results, the team from China University of Petroleum won the first place in the closed track with the final score of 71.34, while the team frome Suzhou University won the first place in the open track with the final socre of 48.77. In this article, we reports the key information about the evaluation task, including key concepts, evaluation dataset, top-3 results and corresponding methods. More information about this task can be found on the website of the CCL-2024 CFSP evaluation task.”
pdf
bib
abs
基于多个大语言模型微调的中文意合图语义解析
Li Rang (李让)
“中文意合图对句中成分间的关系进行层次化标注,能有效表示汉语的深层语义结构。传统方法难以对中文意合图中的特殊成分进行特征表示,而近期大语言模型性能的快速提高为复杂自然语言处理任务提供了一种全新思路。在本次任务中,我们尝试使用Prompt-Response方式对大模型进行LoRA微调,让大模型根据输入直接生成格式化的中文意合图三元组序列。我们广泛测试来自不同研发团队、拥有不同参数规模的七个主流大模型,评估基座模型、参数规模、量化训练等因素对微调后模型性能的影响。实验表明,我们的方法展现出远超依存模型的性能,在测试集和盲测集上的F1分别为0.6956和0.7206,获得了本次评测榜一的成绩。”
pdf
bib
abs
Chinese Parataxis Graph(CPG) Parsing Based on Large Language Models
Sun YueYi
|
Wang Yuxuan
“This paper presents the work submitted for the 23rd China National Conference on Computational Linguistics(Evaluation Workshop)(CCL24-Eval), focusing on the Chinese Parataxis Graph (CPG) Parsing task. CPG represents Chinese natural language hierarchically through relational triplets, providing a consistent representation for linguistic units of varying levels. Our approach has used large-scale language models through full fine-tuning, achieving the result with F1 value at 71.6% in the contest and 74.76% after the contest. Furtehrmore, our team has proposed a combined model that integrates multiple LoRA fine-tuned medium-scale models after the contest. This approach is able to minimize the time and space consumption while keeping the performance of CPG construction task relatively high.”
pdf
bib
abs
基于关系抽取的中文意合图语义解析方法研究
Huo Hongying (霍虹颖)
|
Huang Shaoping (黄少平)
|
Liu Pengyuan (刘鹏远)
“意合图是以事件为中心的单根有向语义表征图,在语义计算与应用方面具有重要价值。在乃乃乌中串丰串临中文意合图语义解析评测任务中,为克服意合图为单根有向图、意合图包含隐性事件词以及意合图的语义关系类型十分丰富,导致关系类型过多等诸多方面的难点,本文提出一种将该任务转换为关系抽取的方法。该方法首先对标签进行扩充,分为正向标签和反向标签;其次,对输入进行扩充,将隐性事件词添加到输入中,无须额外对隐性事词进行预测;最后,细分为不带隐性事件词和带隐性事件词的关系抽取任务。实验结果表明,本文方法在官方盲测集上的F1值为64.44%,高出基线模型33.41%,证明了本文方法的有效性。”
pdf
bib
abs
基于样本设计工程和大模型微调的中文意合图语义解析∗
Si Han (司函)
|
Luo Zhiyong (罗智勇)
“本文介绍了我们在第二十三届中国计算语言学大会中文意合图语义解析评测中提交的参赛系统。中文意合图(Chinese Parataxis Graph,CPG)是以事件为中心的语义表征图,可以对不同层级的语言单元作一贯式表示,是一种通用性与扩展性兼具的语义表征方法。鉴于大语言模型在语义解析任务中的优越性能,我们对Llama3-Chinese-8B-Instruct模型进行了LoRA微调,使其能够生成结构化的意合图表征三元组,并采用了样本设计工程(Sample Design Engineering,SDE)技巧进行微调样本的设计。此外,我们还对不同标签进行了分类微调,探究大模型在不同语义标签预测能力上的差异。最终,我们的参赛系统在任务发布的评测集上F1值达到0.6461,在本次评测任务中获得了第三名的成绩。”
pdf
bib
abs
中文意合图语义解析评测
Guo Mengxi (郭梦溪)
|
Li Meng (李梦)
|
Jin Zeying (靳泽莹)
|
Wu Xiaojing (吴晓靖)
|
Rao Gaoqi (高琦 饶)
|
Tang Gongbo (唐共波)
|
Xun Endong (恩东 荀)
“中文意合图是近年提出的中文语义表示方法。本次评测是首次基于意合图理论的语义分析评测,旨在探索面向意合图理论的语义计算方法,评估机器的语义分析能力。本次评测共有14支队伍报名,最终有7支队伍提交结果,其中有5支队伍提交技术报告与模型,均成功复现。在评测截止时间内,表现最好的队伍使用大语言模型LoRA微调方法获得了F1值为72.06%的成绩。在最终提交技术报告的5支队伍中,有4支队伍使用了大语言模型微调方法,在一定程度上表明了目前技术发展的趋势。”
pdf
bib
abs
基于参数高效微调与半监督学习的空间语义理解
Li Chenyang (李晨阳)
|
Zhang Long (张龙)
|
Zheng Qiusheng (郑秋生)
“本文介绍了我们在第二十三届中文计算语言大会的第四届中文空间语义理解评测任务中提交的参赛模型。该任务旨在测试机器的中文语义理解水平。现有研究显示,机器的中文语义理解水平与人类平均水平相比仍有较大差距。近年来,生成式大规模语言模型在自然语言处理任务中展现了出色的生成和泛化能力。在本次评测中,我们采用了对Qwen1.5-7b模型进行高效微调的方法,以端到端的形式实现空间语义的推理过程,并结合prompt优化和半监督学习提升推理表现。实验结果表明,我们的模型在该任务中取得了领先的效果。”
pdf
bib
abs
基于大型语言模型的中文空间语义评测
Huo Shitu (霍世图)
|
Wang Yujun (王钰君)
|
Wu Tongjie (吴童杰)
“本研究的任务旨在让大模型进行实体识别、角色识别、异常识别、信息推理、同义识别任务,综合评估大模型的空间语义理解能力。其中,我们使用普通提示词、工作流提示词和思维链三种提示词策略来探讨大模型的空间语义理解能力,最后发现ERNIE-4在1-shot的普通提示词上表现最佳。最终,我们的方法排名第六,总体准确率得分为56.20%。”
pdf
bib
abs
基于上下文学习与思维链策略的中文空间语义理解
Wang Shiquan (王士权)
|
Fu Weiwei (付薇薇)
|
Fang Ruiyu (方瑞玉)
|
Li Mengxiang (李孟祥)
|
He Zhongjiang (何忠江)
|
Li Yongxiang (李永翔)
|
Song Shuangyong (宋双永)
“本技术报告详细介绍了我们团队参加第四届中文空间语义理解评测(SpaCE2024)的方法和成果。SpaCE2024旨在全面测试机器对中文空间语义的理解能力,包括空间信息实体识别、空间信息实体识别、空间信息异常识别、空间方位信息推理和空间异形同义识别五个不同的任务。我们团队采用精心设计的prompt并结合微调的方式激发大语言模型的空间语义理解能力,构建了一个高效的空间语义理解系统。在最终的评估中,我们在空间信息实体识别题目中准确率为0.8947,在空间信息实体识别题目中准确率为0.9364,在空间信息异常识别题目中准确率为0.8480,在空间方位信息推理题目中准确率为0.3471,在空间异形同义识别题目中准确率为0.5631,测试集综合准确率为0.6024,排名第一。”
pdf
bib
abs
基于上下文学习的空间语义理解
Wu Hongyan (武洪艳)
|
Lin Nankai (林楠铠)
|
Ceng Peijian (曾培健)
|
Zheng Weixiong (郑伟雄)
|
Jiang Shengyi (蒋盛益)
|
Yang Aimin (阳爱民)
“空间语义理解任务致力于使语言模型能够准确解析和理解文本中描述的物体间的空间方位关系,这一能力对于深入理解自然语言并支持复杂的空间推理至关重要。本文聚焦于探索大模型的上下文学习策略在空间语义理解任务上的有效性,提出了一种基于选项相似度与空间语义理解能力相似度的样本选择策略。本文将上下文学习与高效微调融合对开源模型进行微调,以提高大模型的空间语义理解能力。此外,本文尝试结合开源模型和闭源模型的能力处理不同类型的样本。实验结果显示,本文所采用的策略有效地提高了大模型在空间语义理解任务上的性能。”
pdf
bib
abs
The Fourth Evaluation on Chinese Spatial Cognition
Xiao Liming
|
Hu Nan
|
Zhan Weidong
|
Qin Yuhang
|
Deng Sirui
|
Sun Chunhui
|
Cai Qixu
|
Li Nan
“The Fourth Chinese Spatial Cognition Evaluation Task (SpaCE 2024) presents the first comprehensive Chinese benchmark to assess spatial semantic understanding and reasoning capabilities of Large Language Models (LLMs). It comprises five subtasks in the form of multiple-choice questions: (1) identifying spatial semantic roles; (2) retrieving spatial referents; (3) detecting spatial semantic anomalies; (4) recognizing synonymous spatial expression with different forms; (5) conducting spatial position reasoning. In addition to proposing new tasks, SpaCE 2024 applied a rule-based method to generate high-quality synthetic data with difficulty levels for the reasoning task. 12 teams submitted their models and results, and the top-performing team attained an accuracy of 60.24%, suggesting that there is still significant room for current LLMs to improve, especially in tasks requiring high spatial cognitive processing.”
pdf
bib
abs
面向中文抽象语义表示解析的大模型评估与增强
Chen Rongbo (陈荣波)
|
Pei Zhenwu (裴振武)
|
Bai Xuefeng (白雪峰)
|
Chen Kehai (陈科海)
|
Zhang Min (张民)
“本文介绍了我们在第二十三届中文计算语言学大会中文抽象语义表示解析评测任务中提交的参赛系统。中文抽象语义表示(Chinese Abstract Meaning Representa-tion,CAMR)以一个单根可遍历的有向无环图表示中文句子的语义。本系统选择大语言模型作为解决方案。我们首先系统地评估了当下中文大语言模型在AMR解析任务上的性能,在此基础上基于图融合算法整合性能较高的大模型预测结果,最终得到预测的CAMR图。实验结果表明,1)现有大模型已经具备一定的少样本中文AMR解析能力;2)基于微调中文大模型的AMR解析系统能够取得相较以往最优系统更强的性能;3)图融合算法能够进一步增强基于大模型的CAMR解析系统的性能。”
pdf
bib
abs
混合 LoRA 专家的中文抽象语义表示解析框架
Wu Zihao (吴梓浩)
|
Yin Hua (尹华)
|
Gao Ziqian (高子千)
|
Zhang Jiajia (张佳佳)
|
Ji Yuelei (季跃蕾)
|
Tang Kuntian (唐堃添)
“本文介绍了我们在第二十三届中国计算语言学大会中文抽象语义表示解析评测任务中提交的参赛系统。抽象语义表示 (Abstract Meaning Representation,AMR) 使用有向无环图对句子进行建模,以语义概念作为节点,关系标签作为边,表示一个句子的语义。我们受到结合语法信息的 AMR 解析研究的启发,提出混合 LoRA(Low-Rank Adaption) 专家的 CAMR 解析框架,该框架包含一个由大型语言模型微调而来的基础 CAMR 解析器和 4 个句类专家和 1 个古汉语 LoRA 专家模型。最终,本文所提出的框架在三个评测数据集中均取得了最好的成绩。”
pdf
bib
abs
A Two-stage Generative Chinese AMR Parsing Method Based on Large Language Models
Shen Zizhuo
|
Shao Yanqiu
|
Li Wei
“The purpose of the CAMR task is to convert natural language into a formalized semantic representation in the form of a graph structure. Due to the complexity of the AMR graph structure, traditional AMR automatic parsing methods often require the design of complex models and strategies. Thanks to the powerful generative capabilities of LLMs, adopting an autore-gressive generative approach for AMR parsing has many advantages such as simple modeling and strong extensibility. To further explore the generative AMR automatic parsing technology based on LLMs, we design a two-stage AMR automatic parsing method based on LLMs in this CAMR evaluation. Specifically, we design two pipeline subtasks of alignment-aware node generation and relationship-aware node generation to reduce the difficulty of LLM understanding and generation. Additionally, to boost the system’s transferability, we incorporate a retrieval-augmented strategy during both training and inference phases. The experimental results show that the method we proposed has achieved promising results in this evaluation.”
pdf
bib
abs
The Fourth Chinese Abstract Meaning Representation Parsing Evaluation
Xu Zhixing
|
Zhang Yixuan
|
Li Bin
|
Zhou Junsheng
|
Qu Weiguang
“Abstract Meaning Representation has become a key research area in sentence-level semantic parsing within natural language processing. Substantial progress has been achieved in various NLP tasks using AMR. This paper presents the fourth Chinese Abstract Meaning Representation parsing evaluation, held during the technical evaluation task workshop at CCL 2024. The evaluation also introduced a new test set comprising Ancient Chinese sentences. Results indicated decent performance, with the top team achieving an F1 of 0.8382 in the open modality, surpassing the previous record at CoNLL 2020 by 3.30 percentage points under the MRP metric. However, current large language models perform poorly in AMR parsing of Ancient Chinese, highlighting the need for effective training strategies. The complex syntax and semantics of Ancient Chinese pose significant challenges. Additionally, optimizing transfer learning techniques to better apply knowledge from Chinese Mandarin to Ancient Chinese parsing is crucial. Only through continuous innovation and collaboration can significant advancements in both Ancient Chinese and Chinese Mandarin AMR parsing be achieved.”
pdf
bib
abs
基于大小模型结合与半监督自训练方法的古文事件抽取
Fu Weiwei (付薇薇)
|
Wang Shiquan (王士权)
|
Fang Ruiyu (方瑞玉)
|
Li Mengxiang (李孟祥)
|
He Zhongjiang (何忠江)
|
Li Yongxiang (李永翔)
|
Song Shuangyong (宋双永)
“本文描述了队伍“TeleAI”在CCL2024古文历史事件类型抽取评测任务(CHED2024)中提交的参赛系统。该任务旨在自动识别出古代文本中的事件触发词与事件类型,其中事件类型判别被分为粗粒度和细粒度的事件类型判别两部分。为了提高古文历史事件类型抽取的性能,我们结合了大模型和小模型,并采用了半监督自训练的方法。在最终的评估中,我们在触发词识别任务得分0.763,粗粒度事件类型判别任务得分0.842,细粒度事件类型判别任务得分0.779,综合得分0.791,在所有单项任务和综合评分上均排名第一。”
pdf
bib
abs
Multi-Model Classical Chinese Event Trigger Word Recognition Driven by Incremental Pre-training
Lin Litao
|
Wu Mengcheng
|
Shen Xueying
|
Zhou Jiaxin
|
Ou Shiyan
“This paper addresses the task of identifying and classifying historical event trigger words in Classical Chinese, utilizing both small-scale and large-scale language models. Specifically, we selected the small-scale language model GujiBERT for intelligent processing of classical texts, and the large-scale language model Xunzi-Qwen-14b. Both models underwent continued pretraining and fine-tuning, resulting in GujiBERT-CHED-mlm and Xunzi-Qwen-14b-CHED. For the small-scale language model, we used a BiLSTM as the feature extraction module and a CRF as the decoding module, employing a sequence labeling paradigm to complete the evaluation experiments. For the large-scale language model, we optimized the prompt templates and used a sequence-to-sequence paradigm for evaluation experiments. Our experiments revealed that GujiBERT-BiLSTM-CRF achieved the best performance across all tasks, ranking fourth in overall performance among all participating teams. The large-scale language model demonstrated good semantic understanding abilities, reaching a preliminary usable level. Future research should focus on enhancing its ability to produce standardized outputs.”
pdf
bib
abs
基于增量预训练与外部知识的古文历史事件检测
Kang Wenjun (康文军)
|
Zuo Jiali (左家莉)
|
Hu Yiyu (胡益裕)
|
Wang Mingwen (王明文)
“古文历史事件检测任务旨在识别文本中的事件触发词和类型。为了解决传统pipeline方法容易产生级联错误传播,以及大多数事件检测方法仅依赖句子层面信息的问题,本文提出了一种结合外部信息和全局对应矩阵的联合抽取模型EIGC,以实现触发词和事件类型的精确抽取。此外,本文还整理了一个包含“二十四史”等古汉语文献的数据集,共计约97万条古汉语文本,并利用该文本对BERT-Ancient-Chinese进行增量预训练。最终,本文所提出的模型在三个任务上的总F1值达到了76.2%,验证了该方法的有效性。”
pdf
bib
abs
Classical Chinese Historical Event Detection Evaluation
Feng Zhenbing
|
Li Wei
|
Shao Yanqiu
“Event detection involves identifying and extracting event information from natural language texts. The complex syntax and semantics of Classical Chinese, coupled with its limited usage, pose significant challenges for information extraction tasks on classical Chinese texts. At the 23rd China National Conference on Computational Linguistics (CCL 2024), we launched an evaluation task focused on the extraction of historical events from Classical Chinese. We used our constructed Classical Chinese Historical Event Logical Schema to identify event triggers and classify event types. The evaluation utilized the Classical Chinese Historical Event Detection Dataset (CHED), annotated from The Twenty-Four Histories corpus, with the aim of enhancing event extraction technologies and advancing the digital study of classical Chinese historical texts. The evaluation included two subtasks and attracted 28 teams, with 15 teams submitting valid results. In the subtask of trigger identification, the best-performing system achieved an Exact match score of 63.6%. In the subtasks of coarse-grained and fine-grained event type classification, the top systems achieved F1-scores of 84.5% and 81.4%, respectively.”
pdf
bib
abs
A Unified Multi-Task Learning Model for Chinese Essay Rhetoric Recognition and Component Extraction
Fang Qin
|
Zhang Zheng
|
Wang Yifan
|
Peng Xian
“In this paper, we present our system at CCL24-Eval Task 6: Chinese Essay Rhetoric Recognition and Understanding (CERRU). The CERRU task aims to identify and understand the use of rhetoric in student writing. The evaluation set three tracks to examine the recognition of rhetorical form, rhetorical content, and the extract of rhetorical components. Considering the potential correlation among the track tasks, we employ the unified multi-task learning architecture to fully incorporate the inherent interactions among the related tasks to improve the overall performance and to complete the above 3 track tasks with a single model. Specifically, the framework mainly consists of four sub-tasks: rhetorical device recognition, rhetorical form recognition, rhetorical content recognition, and rhetorical component extraction. The first three tasks are regarded as multi-label classification tasks, and the last task is regarded as an entity recognition task. The four tasks leverage potential information transfer to achieve fusion learning. Finally, the above four sub-tasks are integrated into a unified model through parameter sharing. In the final evaluation results, our system ranked fourth with a total score of 60.14, verifying the effectiveness of our approach.”
pdf
bib
abs
中小学作文修辞识别与理解
Zhao Liang (赵亮)
|
Wu Weixuan (武伟轩)
|
Yu Hao (余浩)
|
Lu Wenbin (鲁文斌)
“本技术报告是对2024CCL评测任务(中小学作文修辞识别与理解评测)的一种解决方案。在中小学生的学习过程中,修辞手法不仅是阅读理解和写作技巧的核心组成部分,同时也是塑造优秀文学作品的不可或缺的元素。识别并理解学生作文中的修辞使用,可以帮助学生提高作文表达能力,指导学生更高质量的叙述和描写。对修辞的识别目前属于自然理解领域比较困难的任务,因为需要用到人类领域的大量先验知识,而且很多时候不同的修辞之间的边界还是模糊的。我们通过lora技术直接微调基于qwen-chat-7B的大语言预训练模型,来进行修辞类别的识别。我们的主要创新技术点为:基于相同的输入输出数据来构造多条训练数据提升算法表现;分级分层来进行修辞的判断,先进行大的修辞类别判断,再把大的修辞类别做为输入对修辞的子类别进行判断;针对修辞成分抽取的任务,直接输出对应的结果文本,再对应回原文本进行位置检索,而不是直接输出索引下标。”
pdf
bib
abs
Essay Rhetoric Recognition and Understanding Using Synthetic Data and Model Ensemble Enhanced Large Language Models
Song Jinwang
|
Zan Hongying
|
Zhang Kunli
“Natural language processing technology has been widely applied in the field of education. Essay writing serves as a crucial method for evaluating students’ language skills and logical thinking abilities. Rhetoric, an essential component of essay, is also a key reference for assessing writing quality. In the era of large language models (LLMs), applying LLMs to the tasks of automatic classification and extraction of rhetorical devices is of significant importance. In this paper, we fine-tune LLMs with specific instructions to adapt them for the tasks of recognizing and extracting rhetorical devices in essays. To further enhance the performance of LLMs, we experimented with multi-task fine-tuning and expanded the training dataset through synthetic data. Additionally, we explored a model ensemble approach based on label re-inference. Our method achieved a score of 66.29 in Task 6 of the CCL 2024 Eval, Chinese Essay Rhetoric Recognition and Understanding(CERRU), securing the first position.”
pdf
bib
abs
基于深度学习模型的中小学作文修辞识别与理解评测
Li Chenyang (李晨阳)
|
Zhang Long (张龙)
|
Zheng Qiusheng (郑秋生)
“在中小学生的学习进程中,修辞手法是阅读和写作技巧的核心,也是优秀文学作品的关键元素。然而,识别与理解学生文章中的修辞使用需要大量的人工,为教师的作文评估和教学提出了挑战。最近的研究开始使用计算机技术来自动评审作文,其中修辞的使用是评估的重要部分。本文介绍了我们在第二十三届中文计算语言大会中中小学作文修辞识别与理解评测中的所用的参赛方法。在本次评测中,我们针对不同任务,分别使用了传统模型分类模型和大模型,再利用伪标签、数据增强等方法提升模型性能。实验结果表明,我们的方法取得了较为先进的效果。”
pdf
bib
abs
人类思维指导下大小模型协同决策的中文修辞识别与理解方法
Wang Wen (王雯)
|
Tang Siyi (汤思怡)
|
Yu Dong (于东)
|
Liu Pengyuan (刘鹏远)
“CCL24-Eval任务6提出了一个多层次、细粒度中小学作文修辞识别与理解任务。针对任务特点,本文提出了人类思维指导下大小模型协同决策的中文修辞识别与理解方法。该方法根据人类在面对修辞识别和理解任务时的处理思路,将任务顺序重新定义,并分别选取大小语言模型,使每个步骤的实现效果均达到局部最优,以局部最优达到整体任务的最优效果。结果表明,本文提出的方法能够有效对修辞进行识别与理解,在三个赛道上相较于Baseline方法分别提升了13.54、4.03、57.11。”
pdf
bib
abs
Chinese Essay Rhetoric Recognition and Understanding (CERRU)
Liu Nuowei
|
Chen Xinhao
|
Ren Yupei
|
Lan Man
|
Bai Xiaopeng
|
Wu Yuanbin
|
Mao Shaoguang
|
Xia Yan
“Rhetoric is fundamental to the reading comprehension and writing skills of primary and middle school students. However, current work independently recognize single coarse-grained categories or fine-grained categories. In this paper, we propose the CCL24-Eval Task6: Chinese Essay Rhetoric Recognition and Understanding (CERRU), consisting of 3 tracks: (1) Fine-grained Form-level Categories Recognition, (2) Fine-grained Content-level Categories Recognition and (3) Rhetorical Component Extraction. A total of 32 teams registered to participate in CERRU and 9 teams submitted evaluation results, with 7 of these teams achieving an overall score that surpassed the baseline.”
pdf
bib
abs
Assessing Essay Fluency with Large Language Models
Wu Haihong
|
Ao Chang
|
Ni Shiwen
“With the development of education and the widespread use of the internet, the scale of essay evaluation has increased, making the cost and efficiency of manual grading a significant challenge. To address this, The Twenty-third China National Conference on Computational Linguistics (CCL2024) established evaluation contest for essay fluency. This competition has three tracks corresponding to three sub-tasks. This paper conducts a detailed analysis of different tasks,employing the BERT model as well as the latest popular large language models Qwen to address these sub-tasks. As a result, our overall scores for the three tasks reached 37.26, 42.48, and 47.64.”
pdf
bib
abs
Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
Zhang Jingshen
|
Yang Xiangyu
|
Su Xinkai
|
Chen Xinglu
|
Huang Tianyou
|
Qiu Xinying
“This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pretraining and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.”
pdf
bib
abs
中小学作文语法错误检测、病句改写与流畅性评级的自动化方法研究
Tian Wei (田巍)
“本研究旨在提高中小学生作文评改的质量和效率,通过引入先进的自然语言处理模型进行作文病句检测、纠正和流畅性评分,并分别针对三个具体的任务进行了模型构建。在任务一中,提出语法错误替换方法进行数据增强,接着基于UTC模型对语病类型进行识别。在任务二中,融合了预训练的BART模型和SynGEC策略进行文本纠错,充分利用了BART的生成能力和SynGEC的语法纠错特性。任务三中,基于TextRCNN-NEZHA模型进行作文流畅性的评级,构建了一个能够综合语义信息的分类器。经评测,本文提出的方法在任务一和任务二中均位列第一,任务三位列第二,即提出的方法可以有效地识别病句类型和纠正作文中的病句,并给出合理的作文流畅性评级。”
pdf
bib
abs
Prompting GPT-4 for Chinese Essay Fluency Evaluation
Zhang Dan
|
Hoang Thuong
|
Zhu Ye
“This report presents the methodology and results of utilizing GPT-4 for CCL24-Eval Task 7 of Chinese Essay Fluency Evaluation (CEFE). The task is divided into three tracks: Identification of Error Sentence Types, Rewriting Error Sentences, and Essay Fluency Rating. We employed a few-shot prompt engineering to guide GPT-4 in performing this task. Our approach integrated fine-grained error analysis with advanced NLP techniques to provide detailed, actionable feedback for students and teachers. Despite some successes, particularly in generating semantically similar and syntactically relevant corrections, our analysis revealed significant challenges, especially in multiple-label classification and the accurate identification of error types. The report discusses these findings and suggests areas for further improvement.”
pdf
bib
abs
基于大模型数据增强的作文流畅性评价方法
Peng Qianwen (彭倩雯)
|
Gao Yanzipeng (高延子鹏)
|
Li Xiaoqing (李晓青)
|
Min Fanke (闵凡珂)
|
Li Mingrui (李明锐)
|
Wang Zhichun (王志春)
|
Liu Tianyun (刘天昀)
“CCL2024-Eval任 务7为 中 小 学 生 作 文 流 畅 性 评 价 (Chinese Essay Fluency Evalua-tion,CEFE),该任务定义了三项重要且富有挑战性的问题,包括中小学作文病句类型识别、中小学作文病句改写、以及中小学作文流畅性评级。本队伍参加了评测任务7的三项子任务,分别获得了45.19、43.90和45.84的得分。本报告详细介绍本队伍在三个子任务上采用的技术方法,并对评测结果进行分析。”
pdf
bib
abs
Chinese Essay Fluency Evaluation (CEFE) Task
Zhuang Xinlin
|
Shen Xinshu
|
Wu Hongyi
|
Lan Man
|
Bai Xiaopeng
|
Wu Yuanbin
|
Zhou Aimin
|
Mao Shaoguang
“This paper presents a detailed review of Task 7 in the CCL24-Eval: the second Chinese Essay Fluency Evaluation (CEFE). The task aims to identify fine-grained grammatical errors that impair readability and coherence in essays authored by Chinese primary and secondary school students, evaluate the essays’ fluency levels, and recommend corrections to improve their written fluency. The evaluation comprises three tracks: (1) Coarse-grained and fine-grained error identification; (2) Error sentence rewriting; and (3) Essay Fluency Level Recognition. We garnered 29 completed registrations, resulting in 180 submissions from 10 dedicated teams. The paper discusses the submissions and analyzes the results from all participating teams.”
pdf
bib
abs
A Two-stage Prompt-Based Strategy for CRMUS Track 1
Chen Mosha
“Large Language Model (LLM) has sparked a new trend in Natural Language Processing, and an increasing number of researchers have recognized the potential of using LLM to unify diverse NLP tasks into a text-generative manner. To explore the potential of LLM for the children’s stories domain, CCL2024 has released the Commonsense Reasoning and Moral Understanding in Children’s Stories (CRMUS) task. This paper presents a straightforward yet effective two-stage prompt-based strategy for the CRMUS Track 1. In the initial stage, we use the same prompt to obtain responses from GPT-4, ERNIE-4, and Qwen-Max. In the subsequent stage, we implement a voting mechanism based on the results from the first stage. For records with inconsistent outcomes, we query GPT-4 for secondary confirmation to determine the final result. Experimental results indicate that our method achieved an average score of 79.27, securing first place in the closed domain among ten participating teams, thereby demonstrating the effectiveness of our approach.”
pdf
bib
abs
基于指令微调与数据增强的儿童故事常识推理与寓意理解研究
Yu Bohan (于博涵)
|
Li Yunlong (李云龙)
|
Liu Tao (刘涛)
|
Zheng Aoze (郑傲泽)
|
Zhang Kunli (张坤丽)
|
Zan Hongying (昝红英)
“尽管现有语言模型在自然语言处理任务上表现出色,但在深层次语义理解和常识推理方面仍有提升空间。本研究通过测试模型在儿童故事常识推理与寓意理解数据集(CRMUS)上的性能,探究如何增强模型在复杂任务中的能力。在本次任务的赛道二中,本研究使用多个7B以内的开源大模型(如Qwen、InternLM等)进行零样本推理,并选择表现最优的模型基于LoRA进行指令微调来提高其表现。除此之外,本研究还对数据集进行了分析与增强。研究结果显示,通过设计有效的指令格式和调整LoRA微调参数,模型在常识推理和寓意理解上的准确率显著提高。最终在本次任务的赛道二中取得第一名的成绩,该任务的评价指标Acc值为74.38,达到了较为先进的水准。”
pdf
bib
abs
Exploring Faithful and Informative Commonsense Reasoning and Moral Understanding in Children’s Stories
Wang Zimu
|
Yuqi Wang
|
Han Nijia
|
Chen Qi
|
Zhang Haiyang
|
Pan Yushan
|
Wang Qiufeng
|
Wang Wei
“Commonsense reasoning and moral understanding are crucial tasks in artificial intelligence (AI) and natural language processing (NLP). However, existing research often falls short in terms of faithfulness and informativeness during the reasoning process. We propose a novel framework for performing commonsense reasoning and moral understanding using large language models (LLMs), involving constructing guided prompts by incorporating relevant knowledge for commonsense reasoning and extracting facts from stories for moral understanding. We conduct extensive experiments on the Commonsense Reasoning and Moral Understanding in Children’s Stories (CRMUS) dataset with widely recognised LLMs under both zero-shot and fine-tuning settings, demonstrating the effectiveness of our proposed method. Furthermore, we analyse the adaptability of different LLMs in extracting facts for moral understanding performance.”
pdf
bib
abs
基于提示工程和思维链的提示词构造
Luo Yun (罗允)
|
Feng Yi (冯毅)
|
Jing Liping (景丽萍)
“儿童故事常识推理与寓意理解评测任务旨在从常识推理和寓意理解两个任务多角度评价中文预训练语言模型和大型语言模型的常识推理和故事理解能力,这考察了模型的常识储备能力以及对文本内容的深入理解能力,因此极具挑战性。随着大语言模型的发展,其卓越的指令跟随能力显著提升了自然语言处理任务的效率和效果。然而,这也对提示词的设计提出了更高的要求,因为提示词的质量直接影响了大模型的表现和预测结果的准确性。因此,设计有效的提示词变得尤为重要,不仅需要理解任务的具体需求,还要具备对语言模型的深入认识和灵活运用能力。本文针对儿童故事常识推理与寓意理解评测赛道一的两个任务,提出了一种基于提示工程的提示词构造方法。首先,我们提出了一种基于融合提示工程、思维链的通用提示词构建框架;然后,我们针对具体的任务调整对应的提示词模板;最后,结合语言模型使用这些提示词进行结果预测。在本次评测中,我们的方法在赛道一的封闭数据条件下获得了第三名的成绩,这验证了我们方法的有效性,并展示了其在自然语言理解领域的应用潜力。”
pdf
bib
abs
Evaluation of Commonsense Reasoning and Moral Understanding in Children’s Stories
Yan Guohang
|
Liang Feihao
|
Guo Yaxin
|
Tan Hongye
|
Li Ru
|
Zhang Hu
“This paper provides a comprehensive review of the the CCL24-Eval Task 8: Commonsense Reasoning and Moral Understanding in Children’s Stories(CRMUS). This task has designed two sub-tasks, which aim to assess the commonsense reasoning and implicit meaning comprehension capabilities of Large Language Models(LLMs). We heve received registration forms from 33 teams, 15 of which submitted final results that exceeded the baseline score. We present the results of the top 5 teams and our analysis of these results.”
pdf
bib
abs
Bridging the Gap between Authentic and Answer-Guided Images for Chinese Vision-Language Understanding Enhancement
Wang Feiyu
|
Guo Wenyu
|
Yu Dong
|
Kang Chen
|
Liu Pengyuan
“The objective of the Chinese Vision-Language Understanding Evaluation (CVLUE) is to comprehensively assess the performance of Chinese vision-language multimodal pre-trained models in multimodal modeling and understanding across four tasks: Image-Text Retrieval, Visual Question Answering, Visual Grounding, and Visual Dialog. To enhance the models’ performance across various multimodal tasks, this paper propose a multimodal information understanding enhancement method based on answer-guided images. Firstly, we propose task-specific methods for answer-guided image generation. Secondly, the authentic and answer-guided images are fed into the model for multimodal fine-tuning, respectively. Finally, training objectives are set for different tasks to minimize the gap between the answer-guided images and authentic images, thereby supervising the results produced by the authentic images utlizing answer-guided images. The experimental results demonstrate the effectiveness of the proposed method.”
pdf
bib
abs
Chinese Vision-Language Understanding Evaluation
Wang Jiangkuo
|
Zheng Linwei
|
Chen Kehai
|
Bai Xuefeng
|
Zhang Min
“This paper introduces our systems submitted for the Chinese Vision-Language Understanding Evaluation task at the 23rd Chinese Computational Linguistics Conference.In this competition, we utilized X2-VLM and CCLM models to participate in various subtasks such as image-text retrieval, visual grounding, visual dialogue, and visual question answering. Additionally, we employed other models to assess performance on certain subtasks. We optimized our models and successfully applied them to these different tasks”
pdf
bib
abs
中文图文多模态理解评测
Wang Yuxuan (王宇轩)
|
Liu Yijun (刘议骏)
|
Wan Zhiguo (万志国)
|
Che Wanxiang (车万翔)
“中文图文多模态理解评测任务旨在从多角度评价中文图文多模态预训练模型的图文多模态建模和理解能力。本任务共包括五个子任务:图片检索、文本检索、视觉问答、视觉定位和视觉对话,最终成绩根据这五个任务的得分综合计算。本文首先介绍了任务的背景和动机,然后从任务介绍、评价指标、比赛结果、参赛方法等方面介绍并展示了本次评测任务的相关信息。本次任务共有11支队伍报名参赛,其中3支队伍提交了结果。”
pdf
bib
abs
维沃手语数字人翻译系统
He Junyuan (何俊远)
|
Liu Xin (刘鑫)
|
Yang Murong (杨牧融)
|
Li Xiaolong (李小龙)
|
Huang Xuming (黄旭铭)
|
Teng Fei (滕飞)
|
Chen Xiaoxin (陈晓昕)
|
Fu Fan (付凡)
“本文介绍了我们在第二十四届中国计算语言学大会手语数字人翻译质量评测中提交的参赛系统。本次评测任务旨在评测手语数字人将汉语翻译成中国手语方面的自然性和准确性。本文介绍的手语数字人翻译系统首先通过手语翻译算法将汉语文本翻译成手语文本,然后将手语文本对应的手语动作单元运用动作融合算法合成为自然、完整的手语数字人动作,同时借助面部驱动算法将口型、表情等非语言元素自然地融入手语合成中,实现带微表情的和唇形同步的手语数字人。最终,我们在官方手语数字人翻译质量的人工评测集上取得了3.513的综合评分,获得了该任务第一名的成绩。”
pdf
bib
abs
结合LLM与3D动画技术的手语数字人系统
Yang Yang (杨阳)
|
Zhang Ying (张颖)
|
Huang Kaiyu (黄锴宇)
|
Xu Jinan (徐金安)
“手语翻译(Sign Language Translation, SLT)系统作为一种重要的辅助技术,为听障人士提供了与他人沟通的有效途径。然而,传统手语翻译系统在准确性、流畅性差等方面存在问题。本文提出了一种结合大语言模型(Large Language Model, LLM)和3D动画技术(3D Animation Technology)的手语翻译系统,旨在克服这些局限,提高翻译的准确性和流畅性。本文详细介绍了系统的设计与实现过程,包括提示词设计、数据处理方法以及手语数字人翻译系统的实现。实验结果表明,采用LLM方法在手语翻译中能够生成较为自然和准确的结果。在标准评估和人工评估的两种评估方法下,本系统在大多数情况下能够较好地完成手语翻译任务,性能优于传统方法。本文的研究为进一步改进手语翻译系统提供了有益的参考和启示。”
pdf
bib
abs
Translation Quality Evaluation of Sign Language Avatar
Zhao Yuan
|
Zhang Ruiquan
|
Yao Dengfeng
|
Chen Yidong
“Sign Language Avatar technology aims to create virtual agents capable of communicating with deaf individuals through sign language, similar to the text dialogue agent ChatGPT but focusing on sign language communication. Challenges in sign language production include limited dataset sizes, information loss due to reliance on intermediate representations, and insufficient realism in generated actions. In this event, we particularly focus on the ability of the Sign Language Avatar to translate spoken language text into sign language that is easily understood by deaf individuals. As the first sign language avatar event held by the China National Conference on Computational Linguistics(CCL), this event attracted wide attention from both industry and academia, with 14 teams registering and 10 of them submitting their system interfaces on time. We provided a dataset consisting of 1074 text-video parallel sentence pairs for training, and the evaluation team comprised proficient Chinese sign language users and professional sign language translators. The scoring method employed a comprehensive evaluation based on multiple metrics, focusing primarily on sign language grammar accuracy, naturalness, readability, and cultural adaptability. The final scores were determined by considering performance across these four aspects. The final scores, taking into account these four aspects, showed that four teams demonstrated good readability, with Vivo Mobile Communication Co., Ltd. ranking first with a score of 3.513 (out of a full score of 5), leading the baseline model by 1.394 points. According to the analysis of the results, most teams used the traditional method of converting text into Gloss sequences before generating sign language. Additionally, some teams experimented with emerging methods, including gloss-free end-to-end training and Large Language Model(LLMs) prompt learning, which also achieved promising results. We anticipate that this event will promote the development of sign language avatar technology and provide higher-quality communication tools for the deaf community. For more information on this task, please visit the website of the CCL24-Eval: Translation Quality Evaluation of Sign Language Avatar Task.”