A log-logistic model-based interpretation of TF normalization of BM25

Yuanhua Lv, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The effectiveness of BM25 retrieval function is mainly due to its sub-linear term frequency (TF) normalization component, which is controlled by a parameter k 1. Although BM25 was derived based on the classic probabilistic retrieval model, it has been so far unclear how to interpret its parameter k 1 probabilistically, making it hard to optimize the setting of this parameter. In this paper, we provide a novel probabilistic interpretation of the BM25 TF normalization and its parameter k 1 based on a log-logistic model for the probability of seeing a document in the collection with a given level of TF. The proposed interpretation allows us to derive different approaches to estimation of parameter k 1 based solely on the current collection without requiring any training data, thus effectively eliminating one free parameter from BM25. Our experiment results show that the proposed approaches can accurately predict the optimal k 1 without requiring training data and achieve better or comparable retrieval performance to a well-tuned BM25 where k 1 is optimized based on training data.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012, Proceedings
Pages244-255
Number of pages12
DOIs
StatePublished - Apr 27 2012
Event34th European Conference on Information Retrieval, ECIR 2012 - Barcelona, Spain
Duration: Apr 1 2012Apr 5 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7224 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other34th European Conference on Information Retrieval, ECIR 2012
CountrySpain
CityBarcelona
Period4/1/124/5/12

Keywords

  • BM25
  • automatic parameter tuning
  • log-logistic model
  • percentile term frequency normalization
  • term frequency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A log-logistic model-based interpretation of TF normalization of BM25'. Together they form a unique fingerprint.

  • Cite this

    Lv, Y., & Zhai, C. (2012). A log-logistic model-based interpretation of TF normalization of BM25. In Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012, Proceedings (pp. 244-255). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7224 LNCS). https://doi.org/10.1007/978-3-642-28997-2_21