|
ABSTRACT
Many variants of language models have been proposed for information retrieval. Most existing models are based on multinomial distribution and would score documents based on query likelihood computed based on a query generation probabilistic model. In this paper, we propose and study a new family of query generation models based on Poisson distribution. We show that while in their simplest forms, the new family of models and the existing multinomial models are equivalent. However, based on different smoothing methods, the two families of models behave differently. We show that the Poisson model has several advantages, including naturally accommodating per-term smoothing and modeling accurate background more efficiently. We present several variants of the new model corresponding to different smoothing methods, and evaluate them on four representative TREC test collections. The results show that while their basic models perform comparably, the Poisson model can out perform multinomial model with per-term smoothing. The performance can be further improved with two-stage smoothing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University, 1998.
|
| |
3
|
K. Church and W. Gale. Poisson mixtures. Nat. Lang. Eng. 1(2):163--190, 1995.
|
| |
4
|
|
 |
5
|
|
| |
6
|
D. Hiemstra. Using Language Models for Information Retrieval PhD thesis, University of Twente, Enschede, Netherlands, 2001.
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
12
|
J. Lafferty and C. Zhai. Probabilistic IR models based on query and document generation. In Proceedings of the Language Modeling and IR workshop pages 1--5, May 31 June 1 2001.
|
| |
13
|
J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In W. B. Croft and J. Lafferty, editors, Language Modeling and Information Retrieval Kluwer Academic Publishers, 2003.
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
19
|
A. Papoulis. Probability, random variables and stochastic processes New York: McGraw-Hill, 1984, 2nd ed., 1984.
|
 |
20
|
|
| |
21
|
|
| |
22
|
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3) pages 109--126, 1995.
|
 |
23
|
|
| |
24
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
[doi> 10.3115/1220835.1220887]
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
|