Divide-and-Conquer Text Simplification by Scalable Data Enhancement

Sanqiang Zhao, Rui Meng, Hui Su, Daqing He


Abstract
Text simplification is a task to reduce the complexity of a text while retain its original meaning. It can facilitate people with low-literacy skills or language impairments, such as children and individuals with dyslexia and aphasia, to read and understand complicated materials. Normally, substitution, deletion, reordering, and splitting are considered as four core operations for performing text simplification. Thus an ideal model should be capable of executing these operations appropriately to simplify a text. However, by examining the degree that each operation is exerted in different datasets, we observe that there is a salient discrepancy between the human annotation and existing training data that is widely used for training simplification models. To alleviate this discrepancy, we propose an unsupervised data construction method that distills each simplifying operation into data via different automatic data enhancement measures. The empirical results demonstrate that the resulting dataset SimSim can support models to achieve better performance by performing all operations properly.
Anthology ID:
2022.tsar-1.15
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–172
Language:
URL:
https://aclanthology.org/2022.tsar-1.15
DOI:
10.18653/v1/2022.tsar-1.15
Bibkey:
Cite (ACL):
Sanqiang Zhao, Rui Meng, Hui Su, and Daqing He. 2022. Divide-and-Conquer Text Simplification by Scalable Data Enhancement. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 166–172, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
Divide-and-Conquer Text Simplification by Scalable Data Enhancement (Zhao et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.15.pdf
Video:
 https://aclanthology.org/2022.tsar-1.15.mp4