Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Michael Günther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao


Abstract
Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets.It underlines the crucial role of data cleaning in dataset preparation, offers in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Text Embedding Benchmark (MTEB). Furthermore, to increase the model’s awareness of grammatical negation, we construct a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.
Anthology ID:
2023.nlposs-1.2
Volume:
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Liling Tan, Dmitrijs Milajevs, Geeticka Chauhan, Jeremy Gwinnup, Elijah Rippeth
Venues:
NLPOSS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–18
Language:
URL:
https://aclanthology.org/2023.nlposs-1.2/
DOI:
10.18653/v1/2023.nlposs-1.2
Bibkey:
Cite (ACL):
Michael Günther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, and Han Xiao. 2023. Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 8–18, Singapore. Association for Computational Linguistics.
Cite (Informal):
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models (Günther et al., NLPOSS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nlposs-1.2.pdf
Video:
 https://aclanthology.org/2023.nlposs-1.2.mp4