| Constrained optimization for validation-guided conditional random field learning |
| Full text |
Mov
(14:42),
Pdf
(561 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 189-198
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Minmin Chen
|
Washington University in St. Louis, Saint Louis, MO, USA
|
|
Yixin Chen
|
Washington University in St. Louis, Saint Louis, MO, USA
|
|
Michael R. Brent
|
Washington University in St. Louis, Saint Louis, MO, USA
|
|
Aaron E. Tenney
|
Washington University in St. Louis, Saint Louis, MO, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 38, Downloads (12 Months): 120, Citation Count: 0
|
|
|
ABSTRACT
Conditional random fields(CRFs) are a class of undirected graphical models which have been widely used for classifying and labeling sequence data. The training of CRFs is typically formulated as an unconstrained optimization problem that maximizes the conditional likelihood. However, maximum likelihood training is prone to overfitting. To address this issue, we propose a novel constrained nonlinear optimization formulation in which the prediction accuracy of cross-validation sets are included as constraints. Instead of requiring multiple passes of training, the constrained formulation allows the cross-validation be handled in one pass of constrained optimization. The new formulation is discontinuous, and classical Lagrangian based constraint handling methods are not applicable. A new constrained optimization algorithm based on the recently proposed extended saddle point theory is developed to learn the constrained CRF model. Experimental results on gene and stock-price prediction tasks show that the constrained formulation is able to significantly improve the generalization ability of CRF training.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Avriel. Nonlinear Programming: Analysis and Methods. Prentice Hall, Englewood Cliffs, N.J., 1976.
|
| |
2
|
S. J. Benson, L. McInnes, J. More, and J. Sarich. TAO user manual (revision 1.8). Technical Report ANL/MCS-TM-242, Mathematics and Computer Science Division, Argonne National Laboratory, 2005.
|
| |
3
|
C. Burge. Identification of genes in human genomic DNA. PhD thesis, Stanford Univerisity, 1997.
|
| |
4
|
M. Chen, Y. Chen, and M. Brent. CRF-OPT: An efficient high-quality conditional random field solver. In Proc. AAAI08, 2008.
|
| |
5
|
S. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models. Technical Report CMUCS-99-108, Carnegie Mellon University, 1999.
|
| |
6
|
A. Culotta, D. Kulp, and A. McCallum. Gene prediction with conditional random fields. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, Apr. 2005.
|
| |
7
|
G. GRIMMETT. A theorem about random fields. Bulletin of the London Mathematical Society, 5:81--84, 1973.
|
| |
8
|
S. S. Gross, C. B. Do, M. Sirota, and S. Batzoglou. CONTRAST: a disriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biology, 8:R269, 2007.
|
| |
9
|
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proc. IJCAI, pages 1137--1145, 1995.
|
| |
10
|
|
| |
11
|
W. I. Newman. Extension to the maximum entropy method. IEEE Trans. on Information Theory, (1):89--93, 1977.
|
| |
12
|
A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. IEEE Int Conference on Computer Vision, 2:1150--1157, Jun. 2003.
|
| |
13
|
Brian Roark , Murat Saraclar , Michael Collins , Mark Johnson, Discriminative language modeling with conditional random fields and the perceptron algorithm, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.47-es, July 21-26, 2004, Barcelona, Spain
[doi> 10.3115/1218955.1218962]
|
| |
14
|
R. Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech, and Language, pages 187--228, 1996.
|
| |
15
|
S. Sarawagi and W. Cohen. Semi-markov conditional random fields for information extraction. In Proc. NIPS, 2004.
|
| |
16
|
|
| |
17
|
|
| |
18
|
C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. In Introduction to Statistical Relational Learning. MIT Press, 2006.
|
| |
19
|
D. L. Vail, J. D. Lafferty, and M. M. Veloso. Feature selection in conditional random fields for activity recognition. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3379--3384, 2007.
|
| |
20
|
B. Wah and Y. Chen. Solving large-scale nonlinear programming problems by constraint partitioning. In Proc. Constraint Programming, pages 697--711, 2005.
|
| |
21
|
|
| |
22
|
B. Wah and M. Qian. Constrained formulations for neural network training and their applications to solve the two-spiral problem. In Proc. Fifth International Conference on Computer Science and Informatics, volume 1, pages 598--601, 2000.
|
|