ACM Home Page
Please provide us with feedback. Feedback
Why inverse document frequency?
Full text PdfPdf (480 KB)
Source North American Chapter Of The Association For Computational Linguistics archive
Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 table of contents
Pittsburgh, Pennsylvania
Pages: 1 - 8  
Year of Publication: 2001
Author
Kishore Papineni  IBM T.J. Watson Research Center, Yorktown Heights, NY
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 77,   Citation Count: 6
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1073336.1073340

ABSTRACT

Inverse Document Frequency (IDF) is a popular measure of a word's importance. The IDF invariably appears in a host of heuristic measures used in information retrieval. However, so far the IDF has itself been a heuristic. In this paper, we show IDF to be optimal in a principled sense. We show that IDF is the optimal weight of a word with respect to minimization of a Kullback-Leibler distance suitably generalized to nonnegative functions which need not be probability distributions. This optimization problem is closely related to maximum entropy problem. We show that the IDF is the optimal weight associated with a word-feature in an information retrieval setting where we treat each document as the query that retrieves itself. That is, IDF is optimal for document self-retrieval.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
I. Csiszar. 1991. Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Annals of Statistics, 19:2032--2066.
 
3
 
4
John Lafferty, Stephen Della Pietra, and Vincent Della Pietra. 1997. Statistical learning algorithms based on Bregman distances. Canadian Workshop on Information Theory, pages 77--80.
 
5
 
6
Kishore Papineni. 2000. A generalized Kullback Leibler distance and its minimization. IBM Research Report RC21815, August. Also available at www.research.ibm.com/resources/paper_search.html.
 
7
S. E. Robertson and K. Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information Science, pages 129--146, May-June.
 
8
 
9
K. Sparck Jones. 1973. Index term weighting. Information Storage and Retrieval, 9:619--633.
 
10
S. K. M. Wong and Y. Y. Yao. 1992. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43:54--61.