ACM Home Page
Please provide us with feedback. Feedback
Hierarchical document categorization with support vector machines
Full text PdfPdf (249 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management table of contents
Washington, D.C., USA
SESSION: KM-1 (knowledge management): clustering I table of contents
Pages: 78 - 87  
Year of Publication: 2004
ISBN:1-58113-874-1
Authors
Lijuan Cai  Brown University, Providence, RI
Thomas Hofmann  Brown University, Providence, RI
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 159,   Citation Count: 29
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031171.1031186
What is a DOI?

ABSTRACT

Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fact that they ignore the inter-class relationships. In this paper, we propose a novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy. Our method can work with arbitrary, not necessarily singly connected taxonomies and can deal with task-specific loss functions. All parameters are learned jointly by optimizing a common objective function corresponding to a regularized upper bound on the empirical loss. We present experimental results on the WIPO-alpha patent collection to show the competitiveness of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Cardoso-Cachopo and A. L. Oliveira. An empirical comparison of text categorization methods. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03), number 2857 in Lecture Notes in Computer Science, pages 183--196. Springer Verlag, 2003.
 
2
 
3
4
 
5
 
6
 
7
 
8
 
9
D. Mladenić and M. Grobelnik. Feature selection for classification based on text hierarchy. In Proceedings of the Conference on Automated Learning and Discovery, 1998.
 
10
 
11
12
 
13
14
15
 
16
R. J. Vanderbei. LOQO: An interior point code for quadratic programming. Optimization Methods and Software, 11:451--484, 1999.
 
17
 
18
 
19
 
20
J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, 1998.
 
21
World Intellectual Property Organization. International patent classification. URL, 2001. http://www.wipo.int/classifications/en/.
 
22
World Intellectual Property Organization. Wipo-alpha sataset. URL, 2003. http://www.wipo.int/ibis/datasets.
 
23
24

CITED BY  30

Collaborative Colleagues:
Lijuan Cai: colleagues
Thomas Hofmann: colleagues