|
ABSTRACT
Max-margin Markov networks (M3N) have shown great promise in structured prediction and relational learning. Due to the KKT conditions, the M3N enjoys dual sparsity. However, the existing M3N formulation does not enjoy primal sparsity, which is a desirable property for selecting significant features and reducing the risk of over-fitting. In this paper, we present an l1-norm regularized max-margin Markov network (l1-M3N), which enjoys dual and primal sparsity simultaneously. To learn an l1-M3N, we present three methods including projected sub-gradient, cutting-plane, and a novel EM-style algorithm, which is based on an equivalence between l1-M3N and an adaptive M3N. We perform extensive empirical studies on both synthetic and real data sets. Our experimental results show that: (1) l1-M3N can effectively select significant features; (2) l1-M3N can perform as well as the pseudo-primal sparse Laplace M3N in prediction accuracy, while consistently outperforms other competing methods that enjoy either primal or dual sparsity; and (3) the EM-algorithm is more robust than the other two in pre-diction accuracy and time efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden {Markov} support vector machines. In ICML, 2003.
|
 |
2
|
|
| |
3
|
P. Bartlett, M. Collins, B. Taskar, and D. McAllester. Exponentiated gradient algorithms for larg-margin structured classification. In NIPS, 2004.
|
| |
4
|
K. Bennett and O. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw, (1):23--34, 1992.
|
| |
5
|
D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
|
| |
6
|
|
| |
7
|
S. Boyd, L. Xiao, and A. Mutapcic. Subgradient methods. In Lecture Notes of EE392o}, Stanford University, Autumn Quarter, 2003--2004.
|
 |
8
|
John Duchi , Shai Shalev-Shwartz , Yoram Singer , Tushar Chandra, Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, p.272-279, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390191]
|
 |
9
|
Amir Globerson , Terry Y. Koo , Xavier Carreras , Michael Collins, Exponentiated gradient algorithms for log-linear structured prediction, Proceedings of the 24th international conference on Machine learning, p.305-312, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273535]
|
| |
10
|
Y. Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In ICANN, 1998.
|
| |
11
|
Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in svms. In NIPS, 2002.
|
| |
12
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, NY, 2001.
|
| |
13
|
|
| |
14
|
|
| |
15
|
J.E. Kelley. The cutting-plane method for solving convex programs. J. Sco. Indust. Appl. Math., (4):703--712.
|
| |
16
|
|
| |
17
|
S.-I. Lee, V. Ganapathi, and D. Koller. Efficient structure learning of {Markov} networks using l_1-regularization. In NIPS, 2006.
|
| |
18
|
|
| |
19
|
N.D. Ratliff, J.A. Bagnell, and M.A. Zinkevich. (Online) Subgradient} methods for structured prediction. In AISTATS, 2007.
|
| |
20
|
B. Taskar, C. Guestrin, and D. Koller. Max-margin {Markov} networks. In NIPS, 2003.
|
 |
21
|
Choon Hui Teo , Alex Smola , S. V.N. Vishwanathan , Quoc Viet Le, A scalable modular convex solver for regularized risk minimization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281270]
|
| |
22
|
|
 |
23
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
| |
24
|
M.J. Wainwright, P. Ravikumar, and J. Lafferty. High-dimensional graphical model selection using l_1 regularized logistic regression. In NIPS, 2006.
|
| |
25
|
|
| |
26
|
J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In NIPS, 2004.
|
 |
27
|
|
 |
28
|
|
| |
29
|
H. Zou. An improved 1-norm svm for simultaneous classification and variable selection. AISTATS, 2007.
|
|