|
ABSTRACT
Recent approaches to learning structured predictors often require approximate inference for tractability; yet its effects on the learned model are unclear. Meanwhile, most learning algorithms act as if computational cost was constant within the model class. This paper sheds some light on the first issue by establishing risk bounds for max-margin learning with LP relaxed inference and addresses the second issue by proposing a new paradigm that attempts to penalize "time-consuming" hypotheses. Our analysis relies on a geometric characterization of the outer polyhedra associated with the LP relaxation. We then apply these techniques to the problem of dependency parsing, for which a concise LP formulation is provided that handles non-local output features. A significant improvement is shown over arc-factored models.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. Conf. Comp. Nat. Lang. Learn., 189--210.
|
| |
3
|
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2004). On the generalization ability of on-line learning algorithms. IEEE Trans. on Inf. Theory, 50, 2050--2057.
|
| |
4
|
Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge. Int. Conf. Mach. Learn. Workshop on Prior Knowledge for Text and Language Processing.
|
| |
5
|
Chu, Y. J., & Liu, T. H. (1965). On the shortest arborescence of a directed graph. Science Sinica, 14, 1396--1400.
|
| |
6
|
Clarke, J., & Lapata, M. (2008). Global inference for sentence compression: An integer linear programming approach. J. Art. Intel. Res., 31, 399--429.
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
Denis, P., & Baldridge, J. (2007). Joint determination of anaphoricity and coreference resolution using integer programming. North. Am. Assoc. Comp. Ling.
|
| |
11
|
Edmonds, J. (1967). Optimum branchings. J. Res. National Bureau of Standards, 71B, 233--240.
|
| |
12
|
|
 |
13
|
|
| |
14
|
Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. Neur. Inf. Proc. Sys. 20, 785--792.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Magnanti, T., & Wolsey, L. (1994). Optimal Trees (Tech. Rep. 290--94). MIT, Op. Res. Center.
|
| |
18
|
Martins, A. F. T., Smith, N. A., & Xing, E. P. (2009). Concise integer linear programming formulations for dependency parsing. Assoc. Comp. Ling. To appear.
|
| |
19
|
McDonald, R., Lerman, K., & Pereira, F. (2006). Multilingual dependency analysis with a two-stage discriminative parser. Conf. Comp. Nat. Lang. Learn., 216--220.
|
| |
20
|
McDonald, R., & Satta, G. (2007). On the complexity of non-projective data-driven dependency parsing. Int. Conf. Pars. Tech., 121--132.
|
| |
21
|
Ryan McDonald , Fernando Pereira , Kiril Ribarov , Jan Hajič, Non-projective dependency parsing using spanning tree algorithms, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.523-530, October 06-08, 2005, Vancouver, British Columbia, Canada
[doi> 10.3115/1220575.1220641]
|
| |
22
|
Ratliff, N., Bagnell, J., & Zinkevich, M. (2006). Subgradient methods for maximum margin structured learning. Int. Conf. Mach. Learn. Workshop on Learning in Structured Outputs Spaces.
|
| |
23
|
|
| |
24
|
Riedel, S., & Clarke, J. (2006). Incremental integer linear programming for non-projective dependency parsing. Emp. Meth. Nat. Lang. Proc., 129--137.
|
 |
25
|
|
| |
26
|
Smith, D. A., & Eisner, J. (2008). Dependency parsing by belief propagation. Emp. Meth. Nat. Lang. Proc., 145--156.
|
| |
27
|
Taskar, B., Guestrin, C., & Koller, D. (2004a). Max-margin markov networks. Neur. Inf. Proc. Sys. 16.
|
 |
28
|
Ben Taskar , Vassil Chatalbashev , Daphne Koller, Learning associative Markov networks, Proceedings of the twenty-first international conference on Machine learning, p.102, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015444]
|
| |
29
|
Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. Neur. Inf. Proc. Sys. 18, 1345--1352.
|
 |
30
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
| |
31
|
|
|