ACM Home Page
Please provide us with feedback. Feedback
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
Full text PdfPdf (452 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: classification table of contents
Pages 83-92  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Jian Huang  Pennsylvania State University, University Park, PA, USA
Omid Madani  SRI International, Menlo Park, CA, USA
C. Lee Giles  Pennsylvania State University, University Park, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 117,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458097
What is a DOI?

ABSTRACT

We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
6
7
 
8
A. Esuli, T. Fagni, and F. Sebastiani. TreeBoost.MH: A boosting algorithm for multi-label hierarchical text categorization. In Proc of 13th Int'l Conf on String Processing and Information Retrieval (SPIRE), 2006.
 
9
10
11
 
12
 
13
 
14
15
 
16
O. Madani and M. Connor. Large-scale many-class learning. In SIAM Conf on Data Mining (SDM), 2008.
 
17
O. Madani, W. Greiner, D. Kempe, and M. R. Salavatipour. Recall systems: Efficient learning and use of category indices. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
18
 
19
M. E. J. Newman. Mixing patterns in networks. Physical Review E, 67:026126, 2003.
 
20
J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 616--623, 2003.
 
21
 
22
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 56(6):386--408, 1958.
 
23
24
 
25
K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29(2):341--348, 1996.
 
26
K. Tumer and J. Ghosh. Robust combining of disparate classifiers through order statistics. Pattern Analysis & Applications, 5(2):189--200, 2002.
 
27
D. J. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.
 
28
 
29
L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418--435, 1992.
 
30
31

Collaborative Colleagues:
Jian Huang: colleagues
Omid Madani: colleagues
C. Lee Giles: colleagues