ACM Home Page
Please provide us with feedback. Feedback
Bayesian extension to the language model for ad hoc information retrieval
Full text PdfPdf (170 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Retreval models table of contents
Pages: 4 - 9  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Hugo Zaragoza  Microsoft Research, Cambridge, U.K.
Djoerd Hiemstra  University of Twente, The Netherlands
Michael Tipping  Microsoft Research, Cambridge, U.K.
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 101,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860439
What is a DOI?

ABSTRACT

We propose a Bayesian extension to the ad-hoc Language Model. Many smoothed estimators used for the multinomial query model in ad-hoc Language Models (including Laplace and Bayes-smoothing) are approximations to the Bayesian predictive distribution. In this paper we derive the full predictive distribution in a form amenable to implementation by classical IR models, and then compare it to other currently used estimators. In our experiments the proposed model outperforms Bayes-smoothing, and its combination with linear interpolation smoothing outperforms all other estimators.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Stanley. F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Center for Research in Computing Technology. Harvard University, August 1998.
 
3
W. B Croft, D. J Harper, D. H Kraft, and J. Zobel, editors. SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 2001.
 
4
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 1885.
 
5
D. Hiemstra and W. Kraaij. Twenty-one at trec-7: ad-hoc and cross-language track. In Voorhees and Harman {11}, pages 227--238. NIST Special Publication 500--242.
6
 
7
David J. C. McKay and Linda C. Bauman Peto. A hierarchical dirichlet language model. Natural Language Engineering, 1(3):289--307, 1995.
 
8
D. Miller, T. Leek, and R. Schwartz. Bbn at trec-7: using hidden markov models for information retrieval. In Voorhees and Harman {11}, pages 133--142. NIST Special Publication 500--242.
9
 
10
S. E. Robertson and D. Hiemstra. Language models and probability of relevance'. In Proceedings of the first Workshop on Language Modeling and Information Retrieval, pages 21--25, 2001.
 
11
E. M. Voorhees and D. K. Harman, editors. The Seventh Text REtrieval Conference (TREC--7). Gaithersburg, MD: NIST, 1999. NIST Special Publication 500--242.
 
12
Grace Wahba. Spline Models for Observational Data, volume. 59. SIAM, 1992.
13
14

CITED BY  12

Collaborative Colleagues:
Hugo Zaragoza: colleagues
Djoerd Hiemstra: colleagues
Michael Tipping: colleagues