|
ABSTRACT
We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process: 1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM-algorithm plus discussions on the paper. Journal of the Royal Statistical Society, 39(B):1--38, 1977.
|
 |
3
|
|
| |
4
|
|
| |
5
|
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7), pages 227--238. NIST Special Publication 500-242, 1998.
|
 |
6
|
|
| |
7
|
H. Jin, R. Schwartz, S. Sista, and F. Walls. Topic tracking for radio, TV broadcast and newswire. In Proceedings of the DARPA Broadcast News Workshop, pages 199--204, 1999.
|
 |
8
|
|
 |
9
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
10
|
|
| |
11
|
V. Lavrenko and W.B. Croft. Relevance models in information retrieval. In W.B. Croft and J. Lafferty, editors, Language Modeling for Information Retrieval, pages 11--56. Kluwer Academic Publishers, 2003.
|
| |
12
|
H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2:159--165, 1958.
|
| |
13
|
|
 |
14
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
15
|
|
| |
16
|
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500-246, pages 483--492, 1999.
|
| |
17
|
J.M. Ponte. Language models for relevance feedback. In W.B. Croft, editor, Advances in information retrieval: recent research from the Center for Intelligent Information Retrieval, pages 73--95. Dordrecht: Kluwer, 2000.
|
 |
18
|
|
| |
19
|
|
| |
20
|
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
|
| |
21
|
|
| |
22
|
T. Saracevic. Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26:321--343, 1975.
|
 |
23
|
|
| |
24
|
K. Sparck-Jones, S.E. Robertson, D. Hiemstra, and H. Zaragoza. Language modelling and relevance. In W.B. Croft and J. Lafferty, editors, Language Modeling for Information Retrieval, pages 57--71. Kluwer Academic Publishers, 2003.
|
| |
25
|
A. Stolcke. Entropy-based pruning of back-off language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270--274, 1998.
|
| |
26
|
|
| |
27
|
E.M. Voorhees. Overview of TREC 2002. In Proceedings of the 11th Text Retrieval Conference (TREC 2002), NIST Special Publication 500-251, pages 1--15, 2002.
|
| |
28
|
T. Westerveld and A.P. de Vries. Multimedia Retrieval using Multiple Examples. In International Conference on Image and Video Retrieval (CIVR'04), 2004.
|
 |
29
|
|
 |
30
|
|
 |
31
|
|
| |
32
|
Y. Zhang, W. Xu, and J. Callan. Exact maximum likelihood estimation for word mixtures. In Text Learning Workshop in International Conference on Machine Learning (ICML'02), 2002.
|
|