|
ABSTRACT
This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and named-entity extraction from web data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. In Automation and Remote Control, 25:821--837.
|
| |
2
|
Bod, R. (1998). Beyond Grammar: An Experience-Based Theory of Language. CSLI Publications/Cambridge University Press.
|
| |
3
|
|
| |
4
|
Borthwick, A., Sterling, J., Agichtein, E., and Grishman, R. (1998). Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. Proc. of the Sixth Workshop on Very Large Corpora.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Collins, M., and Duffy, N. (2001). Convolution Kernels for Natural Language. In Proceedings of Neural Information Processing Systems (NIPS 14).
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Goodman, J. (1996). Efficient algorithms for parsing the DOP model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 143--152.
|
| |
15
|
Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical report, University of Santa Cruz.
|
| |
16
|
Mark Johnson , Stuart Geman , Stephen Canon , Zhiyi Chi , Stefan Riezler, Estimators for stochastic "Unification-Based" grammars, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, p.535-541, June 20-26, 1999, College Park, Maryland
[doi> 10.3115/1034678.1034758]
|
| |
17
|
|
| |
18
|
Lodhi, H., Christianini, N., Shawe-Taylor, J., & Watkins, C. (2001). Text Classification using String Kernels. In Advances in Neural Information Processing Systems 13, MIT Press.
|
| |
19
|
|
| |
20
|
Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference.
|
| |
21
|
|
CITED BY 34
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eugene Ie , Jason Weston , William Stafford Noble , Christina Leslie, Multi-class protein fold recognition using adaptive codes, Proceedings of the 22nd international conference on Machine learning, p.329-336, August 07-11, 2005, Bonn, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|