ACM Home Page
Please provide us with feedback. Feedback
A training algorithm for optimal margin classifiers
Full text PdfPdf (1.00 MB)
Source Annual Workshop on Computational Learning Theory archive
Proceedings of the fifth annual workshop on Computational learning theory table of contents
Pittsburgh, Pennsylvania, United States
Pages: 144 - 152  
Year of Publication: 1992
ISBN:0-89791-497-X
Authors
Bernhard E. Boser  EECS Department, University of California, Berkeley, CA
Isabelle M. Guyon  AT&T Bell Laboratories, 50 Fremont Street, 6th Floor, San Francisco, CA
Vladimir N. Vapnik  AT&T Bell Laboratories, Crawford Corner Road, Holmdel, NJ
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 692,   Citation Count: 276
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/130385.130401
What is a DOI?

ABSTRACT

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
ABR64
M.A. Aizerman, E.M. Braverman, and L.I. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837, 1964.
 
BH89
 
BL88
D.S. Broomhead and D. Lowe. Multivariate functional interpolation and adaptive networks. Complex Systems, 2:321 - 355, 1988.
 
CBD+90
 
CH53
R. Courant and D. Hilbert. Methods of mathematical physics. Interscience, New York, 1953.
 
DH73
R.O. Duda and P.E. Hart. Pattern Classification And Scene Analysis. Wiley and Son, 1973.
 
GBD92
 
GPP+89
I. Guyon, I. Poujaud, L. Personnaz, G. Dreyfus, J. Denker, and Y. LeCun. Comparing different neural network architectures for classifying handwritten digits. In Proc. Int. Joint Conf. Neural Networks. Int. Joint Conference on Neural Networks, 1989.
 
GVB+92
Isabelle Guyon, Vladimir Vapnik, Bernhard Boser, Leon Bottou, and Sara Solla. Structural risk minimization for character recognition. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.
 
HLW88
David Haussler, Nick Littlestone, and Manfred Warmuth. Predicting 0,1-functions on randomly drawn points. In Proceedings of the 29th Annual Symposium on the Foundations of Computer Science, pages 100-109. IEEE, 1988.
 
KM87
W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. J. Phys. A: Math. gen., 20:L745, 1987.
 
Loo72
F.A. Lootsma, editor. Numerical Methods for Non-linear Optimization. Academic Press, London, 1972.
 
Lue84
David Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.
 
Mac92
D. MacKay. A practical bayesian framework for backprop networks. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.
 
MD89
J. Moody and C. Darken. Fast learning in networks of locally tuned processing units. Neural Computation, i (2):281 - 294, 1989.
 
MGB+92
N. Matte, I. Guyon, L. Bottou, J. Denker, and V. Vapnik. Computer-aided cleaning of large databases for character recognition. In Digest ICPR. ICPR, Amsterdam, August 1992.
 
Moo92
J. Moody. Generalization, weight decay, and architecture selection for nonlinear learning systems. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufrnann Publishers, San Mateo, CA, 1992. To appear.
 
NY83
A.S. Nemirovsky and D. Do Yudin. Problem Complexzty and Method Efficiency in Optimization. Wiley, New York, 1983.
 
Omo91
 
PG90
T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to nmltilayer networks. Science, 247:978 - 982, February 1990.
 
Pog75
T. Poggio. On optimal nonlinear associative recall. Biol. Cybernetics, Vol. 19:201-209, 1975.
 
Ros62
F. Rosenblatt. Princzples of neurodynamics. Spartan Books, New York, 1962.
 
SLD92
P. Simard, Y. LeCun, and J. Denker. Tangent prop--a formalism for specifying selected invariances in an adaptive network. In David S. Touretzky, editor, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA, 1992. To appear.
 
TLS89
N. Tishby, E. Levin, and S. A. Solla. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International Joint Conference on Neural Networks, Washington DC, 1989.
 
Vap82
 
VC74
V.N. Vapnik and A.Ya. Chervonenkis. The theory of pattern recognition. Nauka, Moscow, 1974.

CITED BY  276

Collaborative Colleagues:
Bernhard E. Boser: colleagues
Isabelle M. Guyon: colleagues
Vladimir N. Vapnik: colleagues