ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
The feature quantity: an information theoretic perspective of Tfidf-like measures
Full text PdfPdf (822 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 104 - 111  
Year of Publication: 2000
ISBN:1-58113-226-3
Author
Akiko Aizawa  National Institute of Informatics, 2-1-2 Hitotsubashi Chiyoda-ku, Tokyo 101-8430, Japan
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 70,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345556
What is a DOI?

ABSTRACT

The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined as a product of probability and information, and maintains a good correspondence with the tfidf-like measures popularly used in today's IR systems. In this paper, we present a formal description of the feature quantity, as well as some illustrative examples of applying such a quantity to different types of information retrieval tasks: representative term selection and text categorization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Amati and K. van Rijsbergen. Semantze Information Retrieval, 189-219. Kluwer Academic Pub., 1998. (in "Information Retrieval: Uncertainty and Logics").
 
2
 
3
S. A. Caraballo and E. Charniak. Determining the specificity of nouns from text. In EMNLP'99, 1999.
4
 
5
 
6
K. Kita. Probabilistic Language Model. University of Tokyo Press, Japan, 1999.
 
7
 
8
D. Maldenid and M. Grobelnik. Feature selection for classification based on text hierarchy. In Working notes of Learning from Text and the Web, CONALD'98, 1998.
 
9
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI- 98 Workshop on learning for text categorzzation, 42- 49, 1998.
 
10
NACSIS, editor. NTCIR Workshop 1 - proc. of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems, 1999.
 
11
H. Ney, S. Martin, and F. Wessel. Statzstzcal Language Modeling using Leaving-one-out, 174-207. K- luwer Academic Pub., 1997. (in "Corpus-Based Methods in Language and Speech Processing").
 
12
Y. Singer and D. D. Lewis. Machine learning for information retrieval: Advanced techniques. In SI- GIR "99 Tutorial, 1999.
13
 
14
S. Wong and Y. YaH. An information theoretic measure of term specificity. Journal of the Amemcan Soczety for Information Science, 43(1):54-61, 1992.
15
 
16

CITED BY  11