ACM Home Page
Please provide us with feedback. Feedback
On a theory of learning with similarity functions
Full text PdfPdf (190 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 73 - 80  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Maria-Florina Balcan  Carnegie Mellon University, Pittsburgh, PA
Avrim Blum  Carnegie Mellon University, Pittsburgh, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 65,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143854
What is a DOI?

ABSTRACT

Kernel functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not directly correspond to one's intuition of a good kernel as a good similarity function. Furthermore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand since the implicit mapping may not be easy to calculate. Finally, the requirement of positive semi-definiteness may rule out the most natural pairwise similarity functions for the given problem domain.In this work we develop an alternative, more general theory of learning with similarity functions (i.e., sufficient conditions for a similarity function to allow one to learn well) that does not require reference to implicit spaces, and does not require the function to be positive semi-definite (or even symmetric). Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels that describes the effectiveness of a given kernel function in terms of natural similarity-based properties.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Balcan, M.-F., Blum, A., & Vempala, S. (2004). On kernels, margins and low-dimensional mappings. International Conference on Algorithmic Learning Theory.
 
3
 
4
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. (2001). On kernel target alignment. Advances in Neural Information Processing Systems.
 
5
 
6
Herbrich, R. (2002). Learning kernel classifiers. MIT Press.
 
7
 
8
 
9
 
10
 
11
 
12
Muller, K. R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12, 181--201.
 
13
Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. Proc. Symposium on the Mathematical Theory of Automata.
 
14
Scholkopf, B., Tsuda, K., & Vert, J.-P. (2004). Kernel methods in computational biology. MIT Press.
 
15
 
16
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Trans. Information Theory, 44, 1926--1940.
 
17
 
18
 
19
Vapnik, V. N. (1998). Statistical learning theory. Wiley & Sons.


Collaborative Colleagues:
Maria-Florina Balcan: colleagues
Avrim Blum: colleagues