2010/048 - Modele de RI fondes sur l information
Stéphane Clinchant, Eric Gaussier
CIFED 2010 Colloque International Francophone sur l Ecrit et le Document, 18-20 mars 2010, Sousse, Tunisie <BR> Full paper is also available <a href=http://mrim.imag.fr/eric.gaussier/coria-iso.pdf>Link1</a> and <a href=http://dn.revuesonline.com/>Link2</a>
We first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. This leads us to propose a formal definition of burstiness which can be used to characterize probability distributions wrt this phenomenon. We then introduce the family of information-based IR models which naturally captures heuristic retrieval constraints when the underlying probability distribution is bursty and propose a new IR model within this family, based on the log-logistic distribution. The experiments we conduct on three different collections illustrate the good behavior of the log-logistic IR model.