| The feature quantity: an information theoretic perspective of Tfidf-like measures |
| Full text |
Pdf
(822 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 104 - 111
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Author
|
|
Akiko Aizawa
|
National Institute of Informatics, 2-1-2 Hitotsubashi Chiyoda-ku, Tokyo 101-8430, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 80, Citation Count: 11
|
|
|
ABSTRACT
The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined as a product of probability and information, and maintains a good correspondence with the tfidf-like measures popularly used in today's IR systems. In this paper, we present a formal description of the feature quantity, as well as some illustrative examples of applying such a quantity to different types of information retrieval tasks: representative term selection and text categorization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Amati and K. van Rijsbergen. Semantze Information Retrieval, 189-219. Kluwer Academic Pub., 1998. (in "Information Retrieval: Uncertainty and Logics").
|
| |
2
|
|
| |
3
|
S. A. Caraballo and E. Charniak. Determining the specificity of nouns from text. In EMNLP'99, 1999.
|
 |
4
|
|
| |
5
|
|
| |
6
|
K. Kita. Probabilistic Language Model. University of Tokyo Press, Japan, 1999.
|
| |
7
|
|
| |
8
|
D. Maldenid and M. Grobelnik. Feature selection for classification based on text hierarchy. In Working notes of Learning from Text and the Web, CONALD'98, 1998.
|
| |
9
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI- 98 Workshop on learning for text categorzzation, 42- 49, 1998.
|
| |
10
|
NACSIS, editor. NTCIR Workshop 1 - proc. of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems, 1999.
|
| |
11
|
H. Ney, S. Martin, and F. Wessel. Statzstzcal Language Modeling using Leaving-one-out, 174-207. K- luwer Academic Pub., 1997. (in "Corpus-Based Methods in Language and Speech Processing").
|
| |
12
|
Y. Singer and D. D. Lewis. Machine learning for information retrieval: Advanced techniques. In SI- GIR "99 Tutorial, 1999.
|
 |
13
|
|
| |
14
|
S. Wong and Y. YaH. An information theoretic measure of term specificity. Journal of the Amemcan Soczety for Information Science, 43(1):54-61, 1992.
|
 |
15
|
|
| |
16
|
|
|