| Refined experts: improving classification in large taxonomies |
| Full text |
Pdf
(1.81 MB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Classification and clustering
table of contents
Pages 11-18
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 80, Downloads (12 Months): 257, Citation Count: 0
|
|
|
ABSTRACT
While large-scale taxonomies--especially for web pages--have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems--first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10--30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
C. M. Bishop and M. Svensén. Bayesian hierarchical mixtures of experts. In UAI '03, 2003.
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
Ofer Dekel , Joseph Keshet , Yoram Singer, Large margin hierarchical classification, Proceedings of the twenty-first international conference on Machine learning, p.27, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015374]
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
A. R. Klivans and A. A. Sherstov. Improved lower bounds for learning intersections of halfspaces. In COLT '06, 2006.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
Tie-Yan Liu , Yiming Yang , Hao Wan , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma, Support vector machines classification with a very large-scale taxonomy, ACM SIGKDD Explorations Newsletter, v.7 n.1, p.36-43, June 2005
[doi> 10.1145/1089815.1089821]
|
| |
16
|
|
 |
17
|
|
| |
18
|
Netscape Communication Corporation. Open directory project. http://www.dmoz.org.
|
| |
19
|
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, 1999.
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
Benyu Zhang , Hua Li , Yi Liu , Lei Ji , Wensi Xi , Weiguo Fan , Zheng Chen , Wei-Ying Ma, Improving web search results using affinity graph, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076120]
|
|