Title-based Extractive Summarization via MRC Framework

Hongjin Kim, Jai-Eun Kim, Harksoo Kim


Abstract
Existing studies on extractive summarization have primarily focused on scoring and selecting summary sentences independently. However, these models are limited to sentence-level extraction and tend to select highly generalized sentences while overlooking the overall content of a document. To effectively consider the semantics of a document, in this study, we introduce a novel machine reading comprehension (MRC) framework for extractive summarization (MRCSum) by setting a query as the title. Our framework enables MRCSum to consider the semantic coherence and relevance of summary sentences in relation to the overall content. In particular, when a title is not available, we generate a title-like query, which is expected to achieve the same effect as a title. Our title-like query consists of the topic and keywords to serve as information on the main topic or theme of the document. We conduct experiments in both Korean and English languages, evaluating the performance of MRCSum on datasets comprising both long and short summaries. Our results demonstrate the effectiveness of MRCSum in extractive summarization, showcasing its ability to generate concise and informative summaries with or without explicit titles. Furthermore, our MRCSum outperforms existing models by capturing the essence of the document content and producing more coherent summaries.
Anthology ID:
2024.lrec-main.1406
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16175–16186
Language:
URL:
https://aclanthology.org/2024.lrec-main.1406
DOI:
Bibkey:
Cite (ACL):
Hongjin Kim, Jai-Eun Kim, and Harksoo Kim. 2024. Title-based Extractive Summarization via MRC Framework. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16175–16186, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Title-based Extractive Summarization via MRC Framework (Kim et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1406.pdf