ACM Home Page
Please provide us with feedback. Feedback
Context-sensitive learning methods for text categorization
Full text PdfPdf (257 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 17 ,  Issue 2  (April 1999) table of contents
Pages: 141 - 173  
Year of Publication: 1999
ISSN:1046-8188
Authors
William W. Cohen  AT&T Labs
Yoram Singer  AT&T Labs
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 146,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/306686.306688
What is a DOI?

ABSTRACT

Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the “context” of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ALMUALLIM, H. AND DIETTERICH, T. 1991. Learning with many irrelevant features. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91, July 14-19), T. Dean and K. McKeown, Eds. MIT Press, Cambridge, MA.
 
2
3
 
4
ARMSTRONG, R., FRIETAG, D., JOACHIMS, T., AND MITCHELL, T. M. 1995. WebWatcher: A learning apprentice for the World Wide Web. In Proceedings of the 1995 AAAI Spring Symposium on Information Gathering from Heterogenous Distributed Environments (Stanford, CA, Mar.). AAAI Press, Menlo Park, CA.
5
 
6
BLUM, A. 1995. Empirical support for WINNOW and weighted majority algorithms: Results on a calendar scheduling domain. In Proceedings of the 12th International Conference on Machine Learning (Lake Tahoe, CA).
 
7
BRUNK, C. AND PAZZANI, M. 1991. Noise-tolerant relational concept learning algorithms. In Proceedings of the 8th International Workshop on Machine Learning (Ithaca, NY). Morgan Kaufmann, San Mateo, California.
8
 
9
10
 
11
CHURCH, K. W. AND GALE, W.A. 1995. Poisson mixtures. Nat. Lang. Eng. 1, 2, 163-190.
 
12
COHEN, W. W. 1993. Efficient pruning methods for separate-and-conquer rule learning systems. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (Chambery, France).
 
13
COHEN, W. W. 1995a. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning (Lake Tahoe, CA).
 
14
COHEN, W. W. 1995b. Text categorization and relational learning. In Proceedings of the 12th International Conference on Machine Learning (Lake Tahoe, CA).
 
15
COHEN, W. W. 1996a. Learning rules that classify e-mail. In Proceedings of the 1996 AAAI Spring Symposium on Machine Learning and Information Access (Palo Alto, CA). AAAI Press, Menlo Park, CA.
 
16
COHEN, W. W. 1996b. Learning with set-valued features. In Proceedings of the 13th National Conference on Artificial Intelligence (Portland, OR).
 
17
COHEN, W. W. AND SINGER, Y. 1996. Learning to query the Web. In Proceedings of AAAI-96 Workshop on Internet-Based Information Systems. AAAI Press, Menlo Park, CA.
 
18
19
 
20
F RNKRANZ, J. AND WIDMER, G. 1994. Incremental reduced error pruning. In Proceedings of the 11th Annual Conference on Machine Learning (New Brunswick, NJ). Morgan Kaufmann Publishers Inc., San Francisco, CA.
 
21
 
22
HULL, S., PEDERSEN, J., AND SCHUTZE, H. 1995. Method combination for document filtering. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY.
 
23
ITTNER, D. g., LEWIS, D. D., AND AHN, D. D. 1995. Text categorization of low quality images. In Symposium on Document Analysis and Information Retrieval (Las Vegas, NV). 301-315.
 
24
JOHN, G., KOHAVI, R., AND PFEGER, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the 11th Annual Conference on Machine Learning (New Brunswick, NJ). Morgan Kaufmann Publishers Inc., San Francisco, CA.
 
25
26
 
27
 
28
LEWIS, D. AND CATLETT, g. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the 11th Annual Conference on Machine Learning (New Brunswick, NJ). Morgan Kaufmann Publishers Inc., San Francisco, CA.
 
29
 
30
LEWIS, D. AND RINGUETTE, M. 1994. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval (Las Vegas, NV).
31
 
32
 
33
34
 
35
 
36
 
37
 
38
QUINLAN, J. R. 1995. MDL and categorical theories (continued). In Proceedings of the 12th International Conference on Machine Learning (Lake Tahoe, CA).
 
39
ROCCHIO, J. 1971. Relevance feedback information retrieval. In The Smart Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed. Prentice-Hall, Englewood Cliffs, NJ, 313-323.
 
40
SALTON, G. 1991. Developments in automatic text retrieval. Science 253, 974-980.
41
 
42
SCHUTZE, H., HULL, D., AND PEDERSEN, g. 1996. A comparison of classifiers and document representations for the routing problem. In Proceedings of the 19th Annual ACM International SIGIR Conference on Research and Development in Information Retrieval (Zurich, Switzerland). ACM Press, New York, NY.
 
43
 
44
 
45
WIENER, E., PEDERSON, J. O., AND WIEGEND, A. S. 1995. A neural network approach to topic spotting. In Symposium on Document Analysis and Information Retrieval (Las Vegas, NV). 317-332.
 
46
47

CITED BY  29

Collaborative Colleagues:
William W. Cohen: colleagues
Yoram Singer: colleagues