| The role of variance in term weighting for probabilistic information retrieval |
| Full text |
Pdf
(329 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eleventh international conference on Information and knowledge management
table of contents
McLean, Virginia, USA
SESSION: Information retrieval models
table of contents
Pages: 252 - 259
Year of Publication: 2002
ISBN:1-58113-492-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 42, Citation Count: 2
|
|
|
ABSTRACT
In probabilistic approaches to information retrieval, the occurrence of a query term in a document contributes to the probability that the document will be judged relevant. It is typically assumed that the weight assigned to a query term should be based on the expected value of that contribution. In this paper we show that the degree to which observable document features such as term frequencies are expected to vary is also important. By means of stochastic simulation, we show that increased variance results in degraded retrieval performance. We further show that by decreasing term weights in the presence of variance, this degradation can be reduced. Hence, probabilistic models of information retrieval must take into account not only the expected value of a query term's contribution but also the variance of document features.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
W. S. Cooper, A. Chen, and F. C. Gey. Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In D. K. Harman, editor, The Second Text REtreival Conference (TREC-2), pages 57--66, Gaithersburg, Md., Mar. 1994. NIST Special Publication 500-215.
|
 |
3
|
|
| |
4
|
W. B. Croft and D. J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285--295, Dec. 1979.
|
| |
5
|
W. A. Gale and G. Sampson. Good-turing estimation without tears. Journal of Quantitative Linguistics, 2(3):217--237, 1995.
|
| |
6
|
|
| |
7
|
I. J. Good. The population of frequencies and the estimation of population parameters. Biometrica, 40:237--264, 1953.
|
| |
8
|
|
| |
9
|
D. Hiemstra and A. P. de~Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, University of Twente, May 2000.
|
| |
10
|
D. W. Hosmer, Jr and S. Lemeshow. Applied Logistic Regression. John Wiley & Sons, New York, 1989.
|
 |
11
|
|
 |
12
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
 |
13
|
|
| |
14
|
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294--304, 1977.
|
| |
15
|
S. E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1977.
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
|