|
ABSTRACT
Sparsity is a desirable property in high dimensional learning. The l1-norm regularization can lead to primal sparsity, while max-margin methods achieve dual sparsity. Combining these two methods, an l1-norm max-margin Markov network (l1-M3N) can achieve both types of sparsity. This paper analyzes its connections to the Laplace max-margin Markov network (LapM3N), which inherits the dual sparsity of max-margin models but is pseudo-primal sparse, and to a novel adaptive M3N (AdapM3N). We show that the l1-M3N is an extreme case of the LapM3N, and the l1-M3N is equivalent to an AdapM3N. Based on this equivalence we develop a robust EM-style algorithm for learning an l1-M3N. We demonstrate the advantages of the simultaneously (pseudo-) primal and dual sparse models over the ones which enjoy either primal or dual sparsity on both synthetic and real data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hidden Markov support vector machines. International Conference on Machine Learning, 3--10.
|
| |
2
|
Bartlett, P, Collins, M, Taskar, B & McAllester, D (2004). Exponentiated gradient algorithms for large margin structured classification. NIPS, 113--120.
|
| |
3
|
Bradley, D., & Bagnell, A. (2008). Differentiable sparse coding. Neur. Info. Proc. Sys., 113--120.
|
 |
4
|
John Duchi , Shai Shalev-Shwartz , Yoram Singer , Tushar Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, p.272-279, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390191]
|
| |
5
|
|
| |
6
|
Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., & Zhu, J. (2004). Discussion of boosting papers. Annals of Statistics, 102--107.
|
| |
7
|
Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. Inter. Conf. on Aartifical Neural Networks, 201--206.
|
| |
8
|
|
| |
9
|
Lee, S.-I., Ganapathi, V., & Koller, D. (2006). Efficient structure learning of Markov networks using l1-regularization. Neur. Info. Proc. Sys., 817--824.
|
| |
10
|
Qi, Y., Minka, T., Picard, R., & Ghahramani, Z. (2004). Predictive automatic relevance determination by expectation propagation. ICML, 671--678.
|
| |
11
|
Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2007). (Online) Subgradient methods for structured prediction. AI Statistics,.
|
| |
12
|
Smola, A., Vishwanathan, S., & Le, Q. (2007). Buddle methods for machine learning. NIPS, 1377--1384.
|
| |
13
|
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. Neur. Info. Proc. Sys..
|
| |
14
|
Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2006). Structured prediction via the extragradient method. Neur. Info. Proc. Sys., 1345--1352.
|
| |
15
|
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Soceity, Series B, 267--288.
|
| |
16
|
|
 |
17
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
| |
18
|
Wainwright, M., Ravikumar, P., & Lafferty, J. (2006). High-dimensional graphical model selection using l1-regularized logistic regression. NIPS, 1465--1472.
|
| |
19
|
|
| |
20
|
Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. (2004). 1-norm support vector machines. Advances in Neural Information Processing Systems, 49--56.
|
| |
21
|
Zhu, J., & Xing, E. (2009). Maximum entropy discrimination Markov networks. ArXiv 0901.2730.
|
 |
22
|
|
|