ACM Home Page
Please provide us with feedback. Feedback
On primal and dual sparsity of Markov networks
Full text PdfPdf (927 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1265-1272  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Jun Zhu  Carnegie Mellon University, Pittsburgh, PA and Tsinghua University, Beijing, China
Eric P. Xing  Carnegie Mellon University, Pittsburgh, PA
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 39,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553536
What is a DOI?

ABSTRACT

Sparsity is a desirable property in high dimensional learning. The l1-norm regularization can lead to primal sparsity, while max-margin methods achieve dual sparsity. Combining these two methods, an l1-norm max-margin Markov network (l1-M3N) can achieve both types of sparsity. This paper analyzes its connections to the Laplace max-margin Markov network (LapM3N), which inherits the dual sparsity of max-margin models but is pseudo-primal sparse, and to a novel adaptive M3N (AdapM3N). We show that the l1-M3N is an extreme case of the LapM3N, and the l1-M3N is equivalent to an AdapM3N. Based on this equivalence we develop a robust EM-style algorithm for learning an l1-M3N. We demonstrate the advantages of the simultaneously (pseudo-) primal and dual sparse models over the ones which enjoy either primal or dual sparsity on both synthetic and real data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hidden Markov support vector machines. International Conference on Machine Learning, 3--10.
 
2
Bartlett, P, Collins, M, Taskar, B & McAllester, D (2004). Exponentiated gradient algorithms for large margin structured classification. NIPS, 113--120.
 
3
Bradley, D., & Bagnell, A. (2008). Differentiable sparse coding. Neur. Info. Proc. Sys., 113--120.
4
 
5
 
6
Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., & Zhu, J. (2004). Discussion of boosting papers. Annals of Statistics, 102--107.
 
7
Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. Inter. Conf. on Aartifical Neural Networks, 201--206.
 
8
 
9
Lee, S.-I., Ganapathi, V., & Koller, D. (2006). Efficient structure learning of Markov networks using l1-regularization. Neur. Info. Proc. Sys., 817--824.
 
10
Qi, Y., Minka, T., Picard, R., & Ghahramani, Z. (2004). Predictive automatic relevance determination by expectation propagation. ICML, 671--678.
 
11
Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2007). (Online) Subgradient methods for structured prediction. AI Statistics,.
 
12
Smola, A., Vishwanathan, S., & Le, Q. (2007). Buddle methods for machine learning. NIPS, 1377--1384.
 
13
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. Neur. Info. Proc. Sys..
 
14
Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2006). Structured prediction via the extragradient method. Neur. Info. Proc. Sys., 1345--1352.
 
15
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Soceity, Series B, 267--288.
 
16
17
 
18
Wainwright, M., Ravikumar, P., & Lafferty, J. (2006). High-dimensional graphical model selection using l1-regularized logistic regression. NIPS, 1465--1472.
 
19
 
20
Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. (2004). 1-norm support vector machines. Advances in Neural Information Processing Systems, 49--56.
 
21
Zhu, J., & Xing, E. (2009). Maximum entropy discrimination Markov networks. ArXiv 0901.2730.
22