A Joint Model for Document Segmentation and Segment Labeling

Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manjunatha, Douglas Oard, Philip Resnik


Abstract
Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. Where previous work on text segmentation considers the tasks of document segmentation and segment labeling separately, we show that the tasks contain complementary information and are best addressed jointly. We introduce Segment Pooling LSTM (S-LSTM), which is capable of jointly segmenting a document and labeling segments. In support of joint training, we develop a method for teaching the model to recover from errors by aligning the predicted and ground truth segments. We show that S-LSTM reduces segmentation error by 30% on average, while also improving segment labeling.
Anthology ID:
2020.acl-main.29
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
313–322
Language:
URL:
https://aclanthology.org/2020.acl-main.29
DOI:
10.18653/v1/2020.acl-main.29
Bibkey:
Cite (ACL):
Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manjunatha, Douglas Oard, and Philip Resnik. 2020. A Joint Model for Document Segmentation and Segment Labeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 313–322, Online. Association for Computational Linguistics.
Cite (Informal):
A Joint Model for Document Segmentation and Segment Labeling (Barrow et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.29.pdf
Video:
 http://slideslive.com/38929073