Predictive Chemistry Augmented with Text Retrieval

Yujie Qian, Zhening Li, Zhengkai Tu, Connor Coley, Regina Barzilay


Abstract
This paper focuses on using natural language descriptions to enhance predictive models in the chemistry field. Conventionally, chemoinformatics models are trained with extensive structured data manually extracted from the literature. In this paper, we introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature. TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction. This alignment is enhanced via an auxiliary masked LM objective incorporated in the predictor training. We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis. By leveraging text retrieval, TextReact significantly outperforms state-of-the-art chemoinformatics models trained solely on molecular data.
Anthology ID:
2023.emnlp-main.784
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12731–12745
Language:
URL:
https://aclanthology.org/2023.emnlp-main.784
DOI:
10.18653/v1/2023.emnlp-main.784
Bibkey:
Cite (ACL):
Yujie Qian, Zhening Li, Zhengkai Tu, Connor Coley, and Regina Barzilay. 2023. Predictive Chemistry Augmented with Text Retrieval. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12731–12745, Singapore. Association for Computational Linguistics.
Cite (Informal):
Predictive Chemistry Augmented with Text Retrieval (Qian et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.784.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.784.mp4