ACM Home Page
Please provide us with feedback. Feedback
A frequency-based and a poisson-based definition of the probability of being informative
Full text PdfPdf (211 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: IR theory table of contents
Pages: 227 - 234  
Year of Publication: 2003
ISBN:1-58113-646-3
Author
Thomas Roelleke  Queen Mary University of London
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 46,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860478
What is a DOI?

ABSTRACT

This paper reports on theoretical investigations about the assumptions underlying the inverse document frequency (idf). We show that an intuitive idf-based probability function for the probability of a term being informative assumes disjoint document events. By assuming documents to be independent rather than disjoint, we arrive at a Poisson-based probability of being informative. The framework is useful for understanding and deciding the parameter estimation and combination in probabilistic retrieval models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
R. K. Belew. Finding out about. Cambridge University Press, 2000.
 
4
A. Bookstein and D. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25:312--318, 1974.
 
5
I. N. Bronstein. Taschenbuch der Mathematik. Harri Deutsch, Thun, Frankfurt am Main, 1987.
 
6
K. Church and W. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.
 
7
K. W. Church and W. A. Gale. Inverse document frequency: A measure of deviations from poisson. In Third Workshop on Very Large Corpora, ACL Anthology, 1995.
 
8
T. Lafouge and C. Michel. Links between information construction and information gain: Entropy and bibliometric distribution. Journal of Information Science, 27(1):39--49, 2001.
9
 
10
 
11
S. Wong and Y. Yao. An information-theoric measure of term specificity. Journal of the American Society for Information Science, 43(1):54--61, 1992.
12