Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training

Xiaoyang Yi, Yuru Bao, Jian Zhang, Yifang Qin, Faxin Lin


Abstract
Information Extraction (IE), aiming to extract structured information from unstructured natural language texts, can significantly benefit from pre-trained language models. However, existing pre-training methods solely focus on exploiting the textual knowledge, relying extensively on annotated large-scale datasets, which is labor-intensive and thus limits the scalability and versatility of the resulting models. To address these issues, we propose SKIE, a novel pre-training framework tailored for IE that integrates structural semantic knowledge via contrastive learning, effectively alleviating the annotation burden. Specifically, SKIE utilizes Abstract Meaning Representation (AMR) as a low-cost supervision source to boost model performance without human intervention. By enhancing the topology of AMR graphs, SKIE derives high-quality cohesive subgraphs as additional training samples, providing diverse multi-level structural semantic knowledge. Furthermore, SKIE refines the graph encoder to better capture cohesive information and edge relation information, thereby improving the pre-training efficacy. Extensive experimental results demonstrate that SKIE outperforms state-of-the-art baselines across multiple IE tasks and showcases exceptional performance in few-shot and zero-shot settings.
Anthology ID:
2024.emnlp-main.129
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2156–2171
Language:
URL:
https://aclanthology.org/2024.emnlp-main.129/
DOI:
10.18653/v1/2024.emnlp-main.129
Bibkey:
Cite (ACL):
Xiaoyang Yi, Yuru Bao, Jian Zhang, Yifang Qin, and Faxin Lin. 2024. Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2156–2171, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Integrating Structural Semantic Knowledge for Enhanced Information Extraction Pre-training (Yi et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.129.pdf