Simplification by Lexical Deletion

Matthew Shardlow, Piotr Przybyła


Abstract
Lexical simplification traditionally focuses on the replacement of tokens with simpler alternatives. However, in some cases the goal of this task (simplifying the form while preserving the meaning) may be better served by removing a word rather than replacing it. In fact, we show that existing datasets rely heavily on the deletion operation. We propose supervised and unsupervised solutions for lexical deletion based on classification, end-to-end simplification systems and custom language models. We contribute a new silver-standard corpus of lexical deletions (called SimpleDelete), which we mine from simple English Wikipedia edit histories and use to evaluate approaches to detecting superfluous words. The results show that even unsupervised approaches (TerseBERT) can achieve good performance in this new task. Deletion is one part of the wider lexical simplification puzzle, which we show can be isolated and investigated.
Anthology ID:
2023.tsar-1.5
Volume:
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Sanja Štajner, Horacio Saggio, Matthew Shardlow, Fernando Alva-Manchego
Venues:
TSAR | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
44–50
Language:
URL:
https://aclanthology.org/2023.tsar-1.5
DOI:
Bibkey:
Cite (ACL):
Matthew Shardlow and Piotr Przybyła. 2023. Simplification by Lexical Deletion. In Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability, pages 44–50, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Simplification by Lexical Deletion (Shardlow & Przybyła, TSAR-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.tsar-1.5.pdf