| Learning a two-stage SVM/CRF sequence classifier |
| Full text |
Pdf
(181 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: IR/KM: machine learning
table of contents
Pages 271-278
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 206, Citation Count: 0
|
|
|
ABSTRACT
Learning a sequence classifier means learning to predict a sequence of output tags based on a set of input data items. For example, recognizing that a handwritten word is "cat", based on three images of handwritten letters and on general knowledge of English letter combinations, is a sequence classification task. This paper describes a new two-stage approach to learning a sequence classifier that is (i) highly accurate, (ii) scalable, and (iii) easy to use in data mining applications. The two-stage approach combines support vector machines (SVMs) and conditional random fields (CRFs). It is (i) highly accurate because it benefits from the maximum-margin nature of SVMs and also from the ability of CRFs to model correlations between neighboring output tags. It is (ii) scalable because the input to each SVM is a small training set, and the input to the CRF has a small number of features, namely the SVM outputs. It is (iii) easy to use because it combines existing published software in a straightforward way. In detailed experiments on the task of recognizing handwritten words, we show that the two-stage approach is more accurate, or faster and more scalable, or both, than leading other methods for learning sequence classifiers, including max-margin Markov networks (M3Ns) and standard CRFs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Bottou. CRFSGD software, 2008. Available at http://leon.bottou.org/projects/sgd.
|
| |
2
|
|
| |
3
|
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2007. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
T. Joachims. SVMlight Support Vector Machine, 2004. Available at http://svmlight.joachims.org.
|
| |
8
|
T. Joachims. SVM-hmm sequence tagging with structural support vector machines, 2008. Version 3.03 available at http://www.cs.cornell.edu/People-/tj/svm_light/svm_hmm.html.
|
| |
9
|
|
| |
10
|
S. S. Keerthi and S. Sundararajan. CRF versus SVM-Struct for sequence labeling. Technical report, Yahoo Research, 2007.
|
| |
11
|
|
| |
12
|
L. Liao and W. S. Noble. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology, 10(6):857--868, 2003.
|
 |
13
|
|
| |
14
|
J. Nocedal and S. J. Wright. Limited memory BFGS. In Numerical Optimization, pages 222--247. Springer, 1999.
|
| |
15
|
F. Perez-Cruz, Z. Ghahramani, and M. Pontil. Conditional graphical models. In Predicting Structured Data, pages 265--282. MIT Press, Cambridge, MA, USA, 2006.
|
| |
16
|
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In S. Thrun, L. K. Saul, and B. Schölkopf, editors, NIPS. MIT Press, 2003.
|
| |
17
|
|
| |
18
|
|
 |
19
|
S. V. N. Vishwanathan , Nicol N. Schraudolph , Mark W. Schmidt , Kevin P. Murphy, Accelerated training of conditional random fields with stochastic gradient methods, Proceedings of the 23rd international conference on Machine learning, p.969-976, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143966]
|
| |
20
|
R. Yan. MATLABArsenal: A Matlab package for classification algorithms, 2006. Carnegie Mellon University, School of Computer Science.
|
|