|
ABSTRACT
Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible and powerful model for learning to assign labels to elements of sequences in such applications as part-of-speech tagging, text-to-speech mapping, protein and DNA sequence analysis, and information extraction from web pages. However, existing learning algorithms are slow, particularly in problems with large numbers of potential input features. This paper describes a new method for training CRFs by applying Friedman's (1999) gradient tree boosting method. In tree boosting, the CRF potential functions are represented as weighted sums of regression trees. Regression trees are learned by stage-wise optimizations similar to Adaboost, but with the objective of maximizing the conditional likelihood P(Y|X) of the CRF model. By growing regression trees, interactions among features are introduced only as needed, so although the parameter space is potentially immense, the search algorithm does not explicitly consider the large space. As a result, gradient tree boosting scales linearly in the order of the Markov model and in the order of the feature interactions, rather than exponentially like previous algorithms based on iterative scaling and gradient descent.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society B, 36, 192--236.
|
| |
2
|
Besag, J. (1977). Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika, 64, 616--618.
|
| |
3
|
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth International Group.
|
| |
4
|
|
| |
5
|
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. ICML-96 (pp. 148--156). Morgan Kaufmann.
|
| |
6
|
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29.
|
| |
7
|
Geman, D. (1998). Random fields and inverse problems in imaging. In A. Ancona, D. Geman and N. Ikeda (Eds.), École d'Été de probabilités de saintflour xviii, Lecture Notes in Mathematics 1427, 117--196. Berlin: Springer-Verlag.
|
| |
8
|
Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices (Technical Report). Unpublished.
|
| |
9
|
|
| |
10
|
McCallum, A. (2003). Efficiently inducing features of conditional random fields. UAI-2003 (pp. 403--410). San Francisco, CA: Morgan Kaufmann.
|
| |
11
|
|
| |
12
|
Qian, N., & Sejnowski, T. J. (1988). Predicting the secondary structure of globular proteins using neural network models. J. Mol. Bio., 202, 865--884.
|
| |
13
|
|
| |
14
|
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77, 257--286.
|
| |
15
|
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proceedings of the conference on empirical methods in natural language processing (pp. 133--142). Somerset, NJ: ACL.
|
| |
16
|
Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronouce English text. Complex Systems, 1, 145--168.
|
CITED BY 10
|
|
Xuan-Hieu Phan , Le-Minh Nguyen , Tu-Bao Ho , Susumu Horiguchi, Improving discriminative sequential learning with rare--but--important associations, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lin Liao , Tanzeem Choudhury , Dieter Fox , Henry Kautz, Training conditional random fields using virtual evidence boosting, Proceedings of the 20th international joint conference on Artifical intelligence, p.2530-2535, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|
|
|
|
|
|
|