ACM Home Page
Please provide us with feedback. Feedback
Accurate max-margin training for structured output spaces
Full text PdfPdf (335 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 888-895  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Sunita Sarawagi  IIT Bombay, India
Rahul Gupta  IIT Bombay, India
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 34,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390268
What is a DOI?

ABSTRACT

Tsochantaridis et al. (2005) proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be more accurate and better-behaved. We present an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling.

We further argue that existing scaling approaches do not separate the true labeling comprehensively while generating violating constraints. We propose a new max-margin trainer PosLearn that generates violators to ensure separation at each position of a decomposable loss function. Empirical results on real datasets illustrate that PosLearn can reduce test error by up to 25% over margin scaling and 10% over slack scaling. Further, PosLearn violators can be generated more efficiently than slack violators; for many structured tasks the time required is just twice that of MAP inference.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. NIPS.
 
3
 
4
5
 
6
 
7
 
8
LeCun, Y., Chopra, S., Hadsell, R., Marc'Aurelio, R., & Huang, F. (2006). A tutorial on energy-based learning. Predicting Structured Data. MIT Press.
 
9
McCallum, A., Nigam, K., Reed, J., Rennie, J., & Seymore, K. (2000). Cora: Computer science research paper search engine. http://cora.whizbang.com/.
 
10
 
11
 
12
Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. HLT-NAACL (pp. 329--336).
 
13
Ratliff, N., Bagnell, J., & Zinkevich, M. (2007). (online) subgradient methods for structured prediction. AIStats.
 
14
Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. NIPS.
 
15
 
16
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. EMNLP.
 
17
 
18

Collaborative Colleagues:
Sunita Sarawagi: colleagues
Rahul Gupta: colleagues