2024
pdf
bib
abs
Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels
Honglei Zhuang
|
Zhen Qin
|
Kai Hui
|
Junru Wu
|
Le Yan
|
Xuanhui Wang
|
Michael Bendersky
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Zero-shot text rankers powered by recent LLMs achieve remarkable ranking performance by simply prompting. Existing prompts for pointwise LLM rankers mostly ask the model to choose from binary relevance labels like “Yes” and “No”. However, the lack of intermediate relevance label options may cause the LLM to provide noisy or biased answers for documents that are partially relevant to the query. We propose to incorporate fine-grained relevance labels into the prompt for LLM rankers, enabling them to better differentiate among documents with different levels of relevance to the query and thus derive a more accurate ranking. We study two variants of the prompt template, coupled with different numbers of relevance levels. Our experiments on 8 BEIR data sets show that adding fine-grained relevance labels significantly improves the performance of LLM rankers.
pdf
bib
abs
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Zhen Qin
|
Rolf Jagerman
|
Kai Hui
|
Honglei Zhuang
|
Junru Wu
|
Le Yan
|
Jiaming Shen
|
Tianqi Liu
|
Jialu Liu
|
Donald Metzler
|
Xuanhui Wang
|
Michael Bendersky
Findings of the Association for Computational Linguistics: NAACL 2024
Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets.We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these challenging ranking formulations. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP).Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL 2019&2020, PRP based on the Flan-UL2 model with 20B parameters performs favorably with the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, while outperforming other LLM-based solutions, such as InstructGPT which has 175B parameters, by over 10% for all ranking metrics. By using the same prompt template on seven BEIR tasks, PRP outperforms supervised baselines and outperforms the blackbox commercial ChatGPT solution by 4.2% and pointwise LLM-based solutions by more than 10% on average NDCG@10.Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity.
2023
pdf
bib
abs
PaRaDe: Passage Ranking using Demonstrations with LLMs
Andrew Drozdov
|
Honglei Zhuang
|
Zhuyun Dai
|
Zhen Qin
|
Razieh Rahimi
|
Xuanhui Wang
|
Dana Alon
|
Mohit Iyyer
|
Andrew McCallum
|
Donald Metzler
|
Kai Hui
Findings of the Association for Computational Linguistics: EMNLP 2023
Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance. In this work, we improve LLM-based re-ranking by algorithmically selecting few-shot demonstrations to include in the prompt. Our analysis investigates the conditions where demonstrations are most helpful, and shows that adding even one demonstration is significantly beneficial. We propose a novel demonstration selection strategy based on difficulty rather than the commonly used semantic similarity. Furthermore, we find that demonstrations helpful for ranking are also effective at question generation. We hope our work will spur more principled research into question generation and passage ranking.
2006
pdf
bib
Language Model Information Retrieval with Document Expansion
Tao Tao
|
Xuanhui Wang
|
Qiaozhu Mei
|
ChengXiang Zhai
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference