Full-Stack Information Extraction System for Cybersecurity Intelligence

Youngja Park, Taesung Lee


Abstract
Due to rapidly growing cyber-attacks and security vulnerabilities, many reports on cyber-threat intelligence (CTI) are being published daily. While these reports can help security analysts to understand on-going cyber threats,the overwhelming amount of information makes it difficult to digest the information in a timely manner. This paper presents, SecIE, an industrial-strength full-stack information extraction (IE) system for the security domain. SecIE can extract a large number of security entities, relations and the temporal information of the relations, which is critical for cyberthreat investigations. Our evaluation with 133 labeled threat reports containing 108,021 tokens shows thatSecIE achieves over 92% F1-score for entity extraction and about 70% F1-score for relation extraction. We also showcase how SecIE can be used for downstream security applications.
Anthology ID:
2022.emnlp-industry.54
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
531–539
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.54
DOI:
10.18653/v1/2022.emnlp-industry.54
Bibkey:
Cite (ACL):
Youngja Park and Taesung Lee. 2022. Full-Stack Information Extraction System for Cybersecurity Intelligence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 531–539, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Full-Stack Information Extraction System for Cybersecurity Intelligence (Park & Lee, EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.54.pdf