ACM Home Page
Please provide us with feedback. Feedback
The role of variance in term weighting for probabilistic information retrieval
Full text PdfPdf (329 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eleventh international conference on Information and knowledge management table of contents
McLean, Virginia, USA
SESSION: Information retrieval models table of contents
Pages: 252 - 259  
Year of Publication: 2002
ISBN:1-58113-492-4
Authors
Warren R. Greiff  The MITRE Corporation, Bedford, Massachusetts
William T. Morgan  The MITRE Corporation, Bedford, Massachusetts
Jay M. Ponte  The MITRE Corporation, Bedford, Massachusetts
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 38,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584792.584836
What is a DOI?

ABSTRACT

In probabilistic approaches to information retrieval, the occurrence of a query term in a document contributes to the probability that the document will be judged relevant. It is typically assumed that the weight assigned to a query term should be based on the expected value of that contribution. In this paper we show that the degree to which observable document features such as term frequencies are expected to vary is also important. By means of stochastic simulation, we show that increased variance results in degraded retrieval performance. We further show that by decreasing term weights in the presence of variance, this degradation can be reduced. Hence, probabilistic models of information retrieval must take into account not only the expected value of a query term's contribution but also the variance of document features.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
W. S. Cooper, A. Chen, and F. C. Gey. Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In D. K. Harman, editor, The Second Text REtreival Conference (TREC-2), pages 57--66, Gaithersburg, Md., Mar. 1994. NIST Special Publication 500-215.
3
 
4
W. B. Croft and D. J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285--295, Dec. 1979.
 
5
W. A. Gale and G. Sampson. Good-turing estimation without tears. Journal of Quantitative Linguistics, 2(3):217--237, 1995.
 
6
 
7
I. J. Good. The population of frequencies and the estimation of population parameters. Biometrica, 40:237--264, 1953.
 
8
 
9
D. Hiemstra and A. P. de~Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, University of Twente, May 2000.
 
10
D. W. Hosmer, Jr and S. Lemeshow. Applied Logistic Regression. John Wiley & Sons, New York, 1989.
11
12
13
 
14
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294--304, 1977.
 
15
S. E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1977.
 
16
 
17
18


Collaborative Colleagues:
Warren R. Greiff: colleagues
William T. Morgan: colleagues
Jay M. Ponte: colleagues