ACM Home Page
Please provide us with feedback. Feedback
Primal sparse Max-margin Markov networks
Full text MovMov (23:22),  PdfPdf (493 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 1047-1056  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Jun Zhu  Tsinghua University, Beijing, China
Eric P. Xing  Carnegie Mellon University, Pittsburgh, PA, USA
Bo Zhang  Tsinghua University, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 82,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557132
What is a DOI?

ABSTRACT

Max-margin Markov networks (M3N) have shown great promise in structured prediction and relational learning. Due to the KKT conditions, the M3N enjoys dual sparsity. However, the existing M3N formulation does not enjoy primal sparsity, which is a desirable property for selecting significant features and reducing the risk of over-fitting. In this paper, we present an l1-norm regularized max-margin Markov network (l1-M3N), which enjoys dual and primal sparsity simultaneously. To learn an l1-M3N, we present three methods including projected sub-gradient, cutting-plane, and a novel EM-style algorithm, which is based on an equivalence between l1-M3N and an adaptive M3N. We perform extensive empirical studies on both synthetic and real data sets. Our experimental results show that: (1) l1-M3N can effectively select significant features; (2) l1-M3N can perform as well as the pseudo-primal sparse Laplace M3N in prediction accuracy, while consistently outperforms other competing methods that enjoy either primal or dual sparsity; and (3) the EM-algorithm is more robust than the other two in pre-diction accuracy and time efficiency.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden {Markov} support vector machines. In ICML, 2003.
2
 
3
P. Bartlett, M. Collins, B. Taskar, and D. McAllester. Exponentiated gradient algorithms for larg-margin structured classification. In NIPS, 2004.
 
4
K. Bennett and O. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw, (1):23--34, 1992.
 
5
D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
 
6
 
7
S. Boyd, L. Xiao, and A. Mutapcic. Subgradient methods. In Lecture Notes of EE392o}, Stanford University, Autumn Quarter, 2003--2004.
8
9
 
10
Y. Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In ICANN, 1998.
 
11
Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in svms. In NIPS, 2002.
 
12
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, NY, 2001.
 
13
 
14
 
15
J.E. Kelley. The cutting-plane method for solving convex programs. J. Sco. Indust. Appl. Math., (4):703--712.
 
16
 
17
S.-I. Lee, V. Ganapathi, and D. Koller. Efficient structure learning of {Markov} networks using l_1-regularization. In NIPS, 2006.
 
18
 
19
N.D. Ratliff, J.A. Bagnell, and M.A. Zinkevich. (Online) Subgradient} methods for structured prediction. In AISTATS, 2007.
 
20
B. Taskar, C. Guestrin, and D. Koller. Max-margin {Markov} networks. In NIPS, 2003.
21
 
22
23
 
24
M.J. Wainwright, P. Ravikumar, and J. Lafferty. High-dimensional graphical model selection using l_1 regularized logistic regression. In NIPS, 2006.
 
25
 
26
J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In NIPS, 2004.
27
28
 
29
H. Zou. An improved 1-norm svm for simultaneous classification and variable selection. AISTATS, 2007.

Collaborative Colleagues:
Jun Zhu: colleagues
Eric P. Xing: colleagues
Bo Zhang: colleagues