|
ABSTRACT
In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixed-point algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernel-based methods. Our experiments indicate that for multiclass problems we attain state-of-the-art accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
3
|
|
| |
4
|
L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7: 200-217, 1967.
|
| |
5
|
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth & Brooks, 1984.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2: 263-286, January 1995.
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
Thorsten Joachims. Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.
|
| |
20
|
S.S. Keerthi and E.G. Gilbert. Convergence of a generalized smo algorithm for svm classifier design. Technical Report CD-00-01, Control Division Dept. of Mechanical and Production Engineering National University of Singapore, 2000.
|
| |
21
|
C.-J. Lin. Stopping criteria of decomposition methods for support vector machines: a theoretical justification. Technical report, Depratment of Computer Science and Information Engineering, National Taiwan University, May 2001.
|
| |
22
|
|
| |
23
|
J.C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems 12, pages 547-553. MIT Press, 2000.
|
| |
24
|
|
| |
25
|
|
| |
26
|
B. Schölkopf. Support Vector Learning. PhD thesis, GMD First, 1997.
|
| |
27
|
|
| |
28
|
B. Schoelkopf , K. Sung , C. Burges , F. Girosi , P. Niyogi , T. Poggio , V. Vapnik, Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers, Massachusetts Institute of Technology, Cambridge, MA, 1996
|
| |
29
|
Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.
|
| |
30
|
J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999.
|
CITED BY 80
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yasemin Altun , Thomas Hofmann , Alexander J. Smola, Gaussian process classification for segmenting and annotating sequences, Proceedings of the twenty-first international conference on Machine learning, p.4, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yonatan Amit , Michael Fink , Nathan Srebro , Shimon Ullman, Uncovering shared structures in multiclass classification, Proceedings of the 24th international conference on Machine learning, p.17-24, June 20-24, 2007, Corvalis, Oregon
|
|
|
C. Biagioli , E. Francesconi , A. Passerini , S. Montemagni , C. Soria, Automatic semantics extraction in law documents, Proceedings of the 10th international conference on Artificial intelligence and law, June 06-11, 2005, Bologna, Italy
|
|
|
Linli Xu , Dana Wilkinson , Finnegan Southey , Dale Schuurmans, Discriminative unsupervised learning of structured predictors, Proceedings of the 23rd international conference on Machine learning, p.1057-1064, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
Wing W. Y. Ng , Andres Dorado , Daniel S. Yeung , Witold Pedrycz , Ebroul Izquierdo, Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error, Pattern Recognition, v.40 n.1, p.19-32, January, 2007
|
|
|
|
|
|
|
|
|
Antoine Bordes , Léon Bottou , Patrick Gallinari , Jason Weston, Solving multiclass support vector machines with LaRank, Proceedings of the 24th international conference on Machine learning, p.89-96, June 20-24, 2007, Corvalis, Oregon
|
|
|
Jari Björne , Juho Heimonen , Filip Ginter , Antti Airola , Tapio Pahikkala , Tapio Salakoski, Extracting complex biological events with rich graph-based feature sets, Proceedings of the Workshop on BioNLP: Shared Task, June 05-05, 2009, Boulder, Colorado
|
|
|
|
|
|
|
|
|
R. Solera-Ureña , D. Martín-Iglesias , A. Gallardo-Antolín , C. Peláez-Moreno , F. Díaz-de-María, Robust ASR using Support Vector Machines, Speech Communication, v.49 n.4, p.253-267, April, 2007
|
|
|
César A. B. Castañón , Jane S. Fraga , Sandra Fernandez , Arthur Gruber , Luciano da F. Costa, Biological shape characterization for automatic image recognition and diagnosis of protozoan parasites of the genus Eimeria, Pattern Recognition, v.40 n.7, p.1899-1910, July, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Sathiya Keerthi , S. Sundararajan , Kai-Wei Chang , Cho-Jui Hsieh , Chih-Jen Lin, A sequential dual method for large scale multi-class linear svms, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jun Zhu , Amr Ahmed , Eric P. Xing, MedLDA: maximum margin supervised topic models for regression and classification, Proceedings of the 26th Annual International Conference on Machine Learning, p.1257-1264, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Liu Yang , Rong Jin , Jieping Ye, Online learning by ellipsoid method, Proceedings of the 26th Annual International Conference on Machine Learning, p.1153-1160, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
Nicolas Usunier , David Buffoni , Patrick Gallinari, Ranking with ordered weighted pairwise classification, Proceedings of the 26th Annual International Conference on Machine Learning, p.1057-1064, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|