|
ABSTRACT
The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined as a product of probability and information, and maintains a good correspondence with the tfidf-like measures popularly used in today's IR systems. In this paper, we present a formal description of the feature quantity, as well as some illustrative examples of applying such a quantity to different types of information retrieval tasks: representative term selection and text categorization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Amati and K. van Rijsbergen. Semantze Information Retrieval, 189-219. Kluwer Academic Pub., 1998. (in "Information Retrieval: Uncertainty and Logics").
|
| |
2
|
|
| |
3
|
S. A. Caraballo and E. Charniak. Determining the specificity of nouns from text. In EMNLP'99, 1999.
|
 |
4
|
|
| |
5
|
|
| |
6
|
K. Kita. Probabilistic Language Model. University of Tokyo Press, Japan, 1999.
|
| |
7
|
|
| |
8
|
D. Maldenid and M. Grobelnik. Feature selection for classification based on text hierarchy. In Working notes of Learning from Text and the Web, CONALD'98, 1998.
|
| |
9
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI- 98 Workshop on learning for text categorzzation, 42- 49, 1998.
|
| |
10
|
NACSIS, editor. NTCIR Workshop 1 - proc. of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems, 1999.
|
| |
11
|
H. Ney, S. Martin, and F. Wessel. Statzstzcal Language Modeling using Leaving-one-out, 174-207. K- luwer Academic Pub., 1997. (in "Corpus-Based Methods in Language and Speech Processing").
|
| |
12
|
Y. Singer and D. D. Lewis. Machine learning for information retrieval: Advanced techniques. In SI- GIR "99 Tutorial, 1999.
|
 |
13
|
|
| |
14
|
S. Wong and Y. YaH. An information theoretic measure of term specificity. Journal of the Amemcan Soczety for Information Science, 43(1):54-61, 1992.
|
 |
15
|
|
| |
16
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|