ACM Home Page
Please provide us with feedback. Feedback
Parsimonious language models for information retrieval
Full text PdfPdf (165 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: Language models table of contents
Pages: 178 - 185  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Djoerd Hiemstra  University of Twente, Enschede, The Netherlands
Stephen Robertson  Mircrosoft Research, Cambridge, U.K.
Hugo Zaragoza  Mircrosoft Research, Cambridge, U.K.
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 101,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009025
What is a DOI?

ABSTRACT

We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process: 1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM-algorithm plus discussions on the paper. Journal of the Royal Statistical Society, 39(B):1--38, 1977.
3
 
4
 
5
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7), pages 227--238. NIST Special Publication 500-242, 1998.
6
 
7
H. Jin, R. Schwartz, S. Sista, and F. Walls. Topic tracking for radio, TV broadcast and newswire. In Proceedings of the DARPA Broadcast News Workshop, pages 199--204, 1999.
8
9
10
 
11
V. Lavrenko and W.B. Croft. Relevance models in information retrieval. In W.B. Croft and J. Lafferty, editors, Language Modeling for Information Retrieval, pages 11--56. Kluwer Academic Publishers, 2003.
 
12
H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--165, 1958.
 
13
14
 
15
 
16
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500-246, pages 483--492, 1999.
 
17
J.M. Ponte. Language models for relevance feedback. In W.B. Croft, editor, Advances in information retrieval: recent research from the Center for Intelligent Information Retrieval, pages 73--95. Dordrecht: Kluwer, 2000.
18
 
19
 
20
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
 
21
 
22
T. Saracevic. Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26:321--343, 1975.
23
 
24
K. Sparck-Jones, S.E. Robertson, D. Hiemstra, and H. Zaragoza. Language modelling and relevance. In W.B. Croft and J. Lafferty, editors, Language Modeling for Information Retrieval, pages 57--71. Kluwer Academic Publishers, 2003.
 
25
A. Stolcke. Entropy-based pruning of back-off language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270--274, 1998.
 
26
 
27
E.M. Voorhees. Overview of TREC 2002. In Proceedings of the 11th Text Retrieval Conference (TREC 2002), NIST Special Publication 500-251, pages 1--15, 2002.
 
28
T. Westerveld and A.P. de Vries. Multimedia Retrieval using Multiple Examples. In International Conference on Image and Video Retrieval (CIVR'04), 2004.
29
30
31
 
32
Y. Zhang, W. Xu, and J. Callan. Exact maximum likelihood estimation for word mixtures. In Text Learning Workshop in International Conference on Machine Learning (ICML'02), 2002.

CITED BY  15

Collaborative Colleagues:
Djoerd Hiemstra: colleagues
Stephen Robertson: colleagues
Hugo Zaragoza: colleagues