ACM Home Page
Please provide us with feedback. Feedback
A scalable modular convex solver for regularized risk minimization
Full text PdfPdf (1.75 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 727 - 736  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Choon Hui Teo  National ICT Australia, Canberra, Australia
Alex Smola  National ICT Australia, Canberra, Australia
S. V.N. Vishwanathan  National ICT Australia, Canberra, Australia
Quoc Viet Le  National ICT Australia / Max Planck Institute for Biological Cybernetics
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 118,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281270
What is a DOI?

ABSTRACT

A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a highly scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as l1 and l2 penalties. At present, our solver implements 20 different estimation problems, can be easily extended, scales to millions of observations, and is up to 10 times faster than specialized solvers for many applications. The open source code is freely available as part of the ELEFANT toolbox.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11, Argonne National Laboratory, 2006.
 
3
O. E. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. John Wiley and Sons, New York, 1978.
 
4
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1:23--34, 1992.
 
5
S. Benson, L. Curfman-McInnes, J. Moré, and J. Sarich. TAO user manual. Technical Report ANL/MCS-TM-242, Argonne National Laboratory, 2004.
 
6
7
 
8
E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203--4215, 2005.
 
9
C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001.
 
10
O. Chapelle. Training a support vector machine in the primal. Technical Report TR.147, Max Planck Institute for Biological Cybernetics, 2006.
 
11
C. Chu, S. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS 19, 2007.
12
 
13
 
14
 
15
 
16
N. A. C. Cressie. Statistics for Spatial Data. John Wiley and Sons, New York, 1993.
 
17
L. Fahrmeir and G. Tutz. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 1994.
 
18
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.
 
19
 
20
 
21
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115--132, Cambridge, MA, 2000. MIT Press.
 
22
J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms, I and II. 305 and 306. Springer-Verlag, 1993.
 
23
24
25
 
26
 
27
R. Koenker. Quantile Regression. Cambridge University Press, 2005.
 
28
 
29
Q. Le and A. Smola. Direct optimization of ranking measures. JMLR, 2007. submitted.
 
30
O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Oper. Res., 13:444--452, 1965.
 
31
 
32
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. TR 87, Microsoft Research, Redmond, WA, 1999.
 
33
 
34
 
35
S. Shalev-Shwartz and Y. Singer. Online learning optimization in the dual. In COLT, 2006. extended
36
 
37
 
38
B. Taskar, C. Guestrin, and D. Koller. Max-margin networks. In NIPS, pages 25--32, 2004.
 
39
R. Tibshirani. Regression shrinkage and selection via lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 58:267--288 1996.
 
40
 
41
V. Vapnik, S. Golowich, and A. J. Smola. Support method for function approximation, regression estimation, and signal processing. In NIPS, pages 281--287, 1997.
 
42
S. V. N. Vishwanathan and A. J. Smola. Fast kernels string and tree matching. In NIPS, pages 569--576, 2003
 
43

CITED BY  12

Collaborative Colleagues:
Choon Hui Teo: colleagues
Alex Smola: colleagues
S. V.N. Vishwanathan: colleagues
Quoc Viet Le: colleagues