ACM Home Page
Please provide us with feedback. Feedback
Refined experts: improving classification in large taxonomies
Full text PdfPdf (1.81 MB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Classification and clustering table of contents
Pages 11-18  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Paul N. Bennett  Microsoft Research, Redmond, WA, USA
Nam Nguyen  Cornell University, Ithaca, NY, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 80,   Downloads (12 Months): 257,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571946
What is a DOI?

ABSTRACT

While large-scale taxonomies--especially for web pages--have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems--first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10--30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. M. Bishop and M. Svensén. Bayesian hierarchical mixtures of experts. In UAI '03, 2003.
3
4
 
5
6
7
8
 
9
 
10
 
11
A. R. Klivans and A. A. Sherstov. Improved lower bounds for learning intersections of halfspaces. In COLT '06, 2006.
 
12
 
13
14
15
 
16
17
 
18
Netscape Communication Corporation. Open directory project. http://www.dmoz.org.
 
19
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, 1999.
20
21
 
22
 
23
24
25
26

Collaborative Colleagues:
Paul N. Bennett: colleagues
Nam Nguyen: colleagues