When documents are very long, BM25 fails

Yuanhua Lv, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which "shifts" the term frequency normalization formula to boost scores of very long documents. Our experiments show that BM25L, with the same computation cost, is more effective and robust than the standard BM25.

Original languageEnglish (US)
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages1103-1104
Number of pages2
ISBN (Print)9781450309349
DOIs
StatePublished - Jan 1 2011
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China
Duration: Jul 24 2011Jul 28 2011

Publication series

NameSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
CountryChina
CityBeijing
Period7/24/117/28/11

Keywords

  • BM25
  • BM25L
  • Term frequency
  • Very long documents

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'When documents are very long, BM25 fails'. Together they form a unique fingerprint.

Cite this