ACM Home Page
Please provide us with feedback. Feedback
A maximum entropy approach to natural language processing
Full text Publisher SitePublisher Site PdfPdf (1.87 MB)
Source Computational Linguistics archive
Volume 22 ,  Issue 1  (March 1996) table of contents
Pages: 39 - 71  
Year of Publication: 1996
ISSN:0891-2017
Authors
Adam L. Berger  Columbia University
Vincent J. Della Pietra  Renaissance Technologies
Stephen A. Della Pietra  Renaissance Technologies
Publisher
MIT Press  Cambridge, MA, USA
Bibliometrics
Downloads (6 Weeks): 72,   Downloads (12 Months): 474,   Citation Count: 281
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bahl, L.; Brown, P.; de Souza, P.; and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transaction on Acoustics, Speech, and Signal Processing, 37(7).
 
2
 
3
Black, E.; Jelinek, F.; Lafferty, J.; Magerman, D.; Mercer, R.; and Roukos, S. (1992). Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings, DARPA Speech and Natural Language Workshop, Arden House, New York.
 
4
Brown, D. (1959). A Note on Approximations to Discrete Probability Distributions. Information and Control, 2:386--392.
 
5
 
6
 
7
Brown, P.; Della Pietra, V.; de Souza, P.; and Mercer, R. (1990). Class-based N-Gram Models of Natural Language. Proceedings, IBM Natural Language ITL, 283--298.
 
8
 
9
 
10
Csiszár, I. (1975). I-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, 3(1):146--158.
 
11
ibid. (1989). A Geometric Interpretation of Darroch and Ratcliff's Generalized Iterative Scaling. The Annals of Statistics, 17(3):1409--1413.
 
12
Csiszár, L. and Tusnády, G. (1984). Information Geometry and Alternating Minimization Procedures. Statistics & Decisions, Supplemental Issue, no. 1, 205--237.
 
13
Darroch, J. N. and Ratcliff, D. (1972). Generalized Iterative Scaling for Log-linear Models. Annals of Mathematical Statistics, no. 43, 1470--1480.
 
14
 
15
 
16
Dempster, A. P.; Laird, N. M.; and Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(B):1--38.
 
17
Guiasu, S. and Shenitzer, A. (1985). The Principle of Maximum Entropy. The Mathematical Intelligencer, 7(1).
 
18
Jaynes, E. T. (1990) "Notes on Present Status and Future Prospects." In Maximum Entropy and Bayesian Methods, edited by W. T. Grandy and L. H. Schick. Kluwer, 1--13.
 
19
Jelinek, F. and Mercer, R. L. (1980). Interpolated Estimation of Markov Source Parameters from Sparse Data. In Proceedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands.
 
20
Lucassen, J. and Mercer, R. (1984). An Information Theoretic Approach to Automatic Determination of Phonemic Baseforms. In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, San Diego, CA, 42.5.1--42.5.4.
 
21
Merialdo, B. (1990). Tagging Text with a Probabilistic Model. In Proceedings, IBM Natural Language ITL, Paris, France, 161--172.
 
22
Nádas, A.; Mercer, R.; Bahl, L.; Bakis, R.; Cohen, P.; Cole, A.; Jelinek, F.; and Lewis, B. (1981). Continuous Speech Recognition with Automatically Selected Acoustic Prototypes Obtained by either Bootstrapping or Clustering. In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, 1153--1155.
 
23
Sokolnikoff, I. S. and Redheffer, R. M. (1966). Mathematics of Physics and Modern Engineering, Second Edition, McGraw-Hill Book Company.

CITED BY  281
Collaborative Colleagues:
Adam L. Berger: colleagues
Vincent J. Della Pietra: colleagues
Stephen A. Della Pietra: colleagues