|
ABSTRACT
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bahl, L.; Brown, P.; de Souza, P.; and Mercer, R. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transaction on Acoustics, Speech, and Signal Processing, 37(7).
|
| |
2
|
Adam L. Berger , Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , John R. Gillett , John D. Lafferty , Robert L. Mercer , Harry Printz , Luboš Ureš, The Candide system for machine translation, Proceedings of the workshop on Human Language Technology, March 08-11, 1994, Plainsboro, NJ
[doi> 10.3115/1075812.1075844]
|
| |
3
|
Black, E.; Jelinek, F.; Lafferty, J.; Magerman, D.; Mercer, R.; and Roukos, S. (1992). Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings, DARPA Speech and Natural Language Workshop, Arden House, New York.
|
| |
4
|
Brown, D. (1959). A Note on Approximations to Discrete Probability Distributions. Information and Control, 2:386--392.
|
| |
5
|
|
| |
6
|
Peter F. Brown , John Cocke , Stephen A. Della Pietra , Vincent J. Della Pietra , Fredrick Jelinek , John D. Lafferty , Robert L. Mercer , Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
|
| |
7
|
Brown, P.; Della Pietra, V.; de Souza, P.; and Mercer, R. (1990). Class-based N-Gram Models of Natural Language. Proceedings, IBM Natural Language ITL, 283--298.
|
| |
8
|
Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , Robert L. Mercer, A statistical approach to sense disambiguation in machine translation, Proceedings of the workshop on Speech and Natural Language, p.146-151, February 19-22, 1991, Pacific Grove, California
[doi> 10.3115/112405.112427]
|
| |
9
|
|
| |
10
|
Csiszár, I. (1975). I-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, 3(1):146--158.
|
| |
11
|
ibid. (1989). A Geometric Interpretation of Darroch and Ratcliff's Generalized Iterative Scaling. The Annals of Statistics, 17(3):1409--1413.
|
| |
12
|
Csiszár, L. and Tusnády, G. (1984). Information Geometry and Alternating Minimization Procedures. Statistics & Decisions, Supplemental Issue, no. 1, 205--237.
|
| |
13
|
Darroch, J. N. and Ratcliff, D. (1972). Generalized Iterative Scaling for Log-linear Models. Annals of Mathematical Statistics, no. 43, 1470--1480.
|
| |
14
|
Stephen Della Pietra , Vincent J. Della Pietra , J. Gillet , John D. Lafferty , H. Printz , L. Ures, Inference and Estimation of a Long-Range Trigram Model, Proceedings of the Second International Colloquium on Grammatical Inference and Applications, p.78-92, September 21-23, 1994
|
| |
15
|
|
| |
16
|
Dempster, A. P.; Laird, N. M.; and Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(B):1--38.
|
| |
17
|
Guiasu, S. and Shenitzer, A. (1985). The Principle of Maximum Entropy. The Mathematical Intelligencer, 7(1).
|
| |
18
|
Jaynes, E. T. (1990) "Notes on Present Status and Future Prospects." In Maximum Entropy and Bayesian Methods, edited by W. T. Grandy and L. H. Schick. Kluwer, 1--13.
|
| |
19
|
Jelinek, F. and Mercer, R. L. (1980). Interpolated Estimation of Markov Source Parameters from Sparse Data. In Proceedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands.
|
| |
20
|
Lucassen, J. and Mercer, R. (1984). An Information Theoretic Approach to Automatic Determination of Phonemic Baseforms. In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, San Diego, CA, 42.5.1--42.5.4.
|
| |
21
|
Merialdo, B. (1990). Tagging Text with a Probabilistic Model. In Proceedings, IBM Natural Language ITL, Paris, France, 161--172.
|
| |
22
|
Nádas, A.; Mercer, R.; Bahl, L.; Bakis, R.; Cohen, P.; Cole, A.; Jelinek, F.; and Lewis, B. (1981). Continuous Speech Recognition with Automatically Selected Acoustic Prototypes Obtained by either Bootstrapping or Clustering. In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, 1153--1155.
|
| |
23
|
Sokolnikoff, I. S. and Redheffer, R. M. (1966). Mathematics of Physics and Modern Engineering, Second Edition, McGraw-Hill Book Company.
|
CITED BY 281
|
|
Masaki Murata , Kiyotaka Uchimoto , Qing Ma , Hitoshi Isahara, Bunsetsu identification using category-exclusive rules, Proceedings of the 18th conference on Computational linguistics, p.565-571, July 31-August 04, 2000, Saarbrücken, Germany
|
|
|
|
|
|
John Lafferty, Additive models, boosting, and inference for generalized divergences, Proceedings of the twelfth annual conference on Computational learning theory, p.125-133, July 07-09, 1999, Santa Cruz, California, United States
|
|
|
|
|
|
Sham Kakade , Michael Kearns , John Langford , Luis Ortiz, Correlated equilibria in graphical games, Proceedings of the 4th ACM conference on Electronic commerce, p.42-47, June 09-12, 2003, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Ismael García Varea , Franz J. Och , Hermann Ney , Francisco Casacuberta, Refined lexicon models for statistical machine translation using a maximum entropy approach, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, p.204-211, July 06-11, 2001, Toulouse, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hiroshi Kanayama , Kentaro Torisawa , Yutaka Mitsuishi , Jun'ichi Tsujii, A hybrid Japanese parser with hand-crafted grammar and statistics, Proceedings of the 18th conference on Computational linguistics, p.411-417, July 31-August 04, 2000, Saarbrücken, Germany
|
|
|
|
|
|
|
|
|
Kiyotaka Uchimoto , Masaki Murata , Qing Ma , Satoshi Sekine , Hitoshi Isahara, Word order acquisition from corpora, Proceedings of the 18th conference on Computational linguistics, July 31-August 04, 2000, Saarbrücken, Germany
|
|
|
|
|
|
Erik F. Tjong Kim Sang , Walter Daelemans , Hervé Déjean , Rob Koeling , Yuval Krymolowski , Vasin Punyakanok , Dan Roth, Applying system combination to base noun phrase identification, Proceedings of the 18th conference on Computational linguistics, July 31-August 04, 2000, Saarbrücken, Germany
|
|
|
Takehito Utsuro , Takashi Miyata , Yuji Matsumoto, General-to-specific model selection for subcategorization preference, Proceedings of the 17th international conference on Computational linguistics, p.1314-1320, August 10-14, 1998, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
Daming Yao , Jingbo Wang , Yanmei Lu , Nathan Noble , Huandong Sun , Xiaoyan Zhu , Nan Lin , Donald G. Payan , Ming Li , Kunbin Qu, PathwayFinder: paving the way towards automatic pathway extraction, Proceedings of the second conference on Asia-Pacific bioinformatics, p.53-62, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charles Sutton , Khashayar Rohanimanesh , Andrew McCallum, Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data, Proceedings of the twenty-first international conference on Machine learning, p.99, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
Steven J. Phillips , Miroslav Dudík , Robert E. Schapire, A maximum entropy approach to species distribution modeling, Proceedings of the twenty-first international conference on Machine learning, p.83, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
Mark Johnson , Stuart Geman , Stephen Canon , Zhiyi Chi , Stefan Riezler, Estimators for stochastic "Unification-Based" grammars, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, p.535-541, June 20-26, 1999, College Park, Maryland
|
|
|
|
|
|
|
|
|
Marion Mast , Thomas Ross , Henrik Schulz , Heli Harrikari , Vasiliki Demesticha , Lazaros Polymenakos , Yannis Vamvakoulas , Jan Stadermann, Conversational natural language understanding interfacing city event information, Data & Knowledge Engineering, v.42 n.3, p.343-360, September 2002
|
|
|
|
|
|
Radu Florian , Abe Ittycheriah , Hongyan Jing , Tong Zhang, Named entity recognition through classifier combination, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, p.168-171, May 31, 2003, Edmonton, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jennifer Chu-Carroll , Krzysztof Czuba , John Prager , Abraham Ittycheriah, In question answering, two heads are better than one, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.24-31, May 27-June 01, 2003, Edmonton, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yunhua Hu , Hang Li , Yunbo Cao , Dmitriy Meyerzon , Qinghua Zheng, Automatic extraction of titles from general documents using machine learning, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2005, Denver, CO, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abraham Ittycheriah , Martin Franz , Wei-Jing Zhu , Adwait Ratnaparkhi , Richard J. Mammone, Question answering using maximum entropy components, Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, p.1-7, June 01-07, 2001, Pittsburgh, Pennsylvania
|
|
|
|
|
|
Hoa Trang Dang , Ching-yi Chia , Martha Palmer , Fu-Dong Chiou, Simple features for Chinese word sense disambiguation, Proceedings of the 19th international conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan
|
|
|
Patrick Haffner , Subhabrata Sen , Oliver Spatscheck , Dongmei Wang, ACAS: automated construction of application signatures, Proceeding of the 2005 ACM SIGCOMM workshop on Mining network data, August 26-26, 2005, Philadelphia, Pennsylvania, USA
|
|
|
|
|
|
Xuan-Hieu Phan , Le-Minh Nguyen , Tu-Bao Ho , Susumu Horiguchi, Improving discriminative sequential learning with rare--but--important associations, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
Kiyotaka Uchimoto , Chikashi Nobata , Atsushi Yamada , Hitoshi Isahara , Satoshi Sekine, Morphological analysis of the spontaneous speech corpus, Proceedings of the 19th international conference on Computational linguistics, p.1-5, August 24-September 01, 2002, Taipei, Taiwan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ismael García Varea , Franz J. Och , Hermann Ney , Francisco Casacuberta, Improving alignment quality in statistical machine translation using context-dependent maximum entropy models, Proceedings of the 19th international conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shaojun Wang , Shaomin Wang , Russell Greiner , Dale Schuurmans , Li Cheng, Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields, Proceedings of the 22nd international conference on Machine learning, p.948-955, August 07-11, 2005, Bonn, Germany
|
|
|
Kiyotaka Uchimoto , Chikashi Nobata , Atsushi Yamada , Satoshi Sekine , Hitoshi Isahara, Morphological analysis of a large spontaneous speech corpus in Japanese, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, p.479-488, July 07-12, 2003, Sapporo, Japan
|
|
|
Kiyotaka Uchimoto , Qing Ma , Masaki Murata , Hiromi Ozaku , Hitoshi Isahara, Named entity extraction based on a maximum entropy model and transformation rules, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, p.326-335, October 03-06, 2000, Hong Kong
|
|
|
|
|
|
|
|
|
|
|
|
John Nerbonne , Anja Belz , Nicola Cancedda , Hervé Déjean , James Hammerton , Rob Koeling , Stasinos Konstantopoulos , Miles Osborne , Franck Thollard , Erik Tjong Kim Sang, Learning computational grammars, Proceedings of the 2001 workshop on Computational Natural Language Learning, p.1-8, July 06-07, 2001, Toulouse, France
|
|
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, 2D Conditional Random Fields for Web information extraction, Proceedings of the 22nd international conference on Machine learning, p.1044-1051, August 07-11, 2005, Bonn, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Kehler , Jerry R. Hobbs , Douglas Appelt , John Bear , Matthew Caywood , David Israel , Megumi Kameyama , David Martin , Claire Monteleoni, Information extraction research and applications: current progress and future directions, Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, October 13-15, 1998, Baltimore, Maryland
|
|
|
Andreas Stolcke , Noah Coccaro , Rebecca Bates , Paul Taylor , Carol Van Ess-Dykema , Klaus Ries , Elizabeth Shriberg , Daniel Jurafsky , Rachel Martin , Marie Meteer, Dialogue act modeling for automatic tagging and recognition of conversational speech, Computational Linguistics, v.26 n.3, p.339-373, September 2000
|
|
|
|
|
|
|
|
|
Jie Tang , Hang Li , Yunbo Cao , Zhaohui Tang, Email data cleaning, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jun'ichi Kazama , Takaki Makino , Yoshihiro Ohta , Jun'ichi Tsujii, Tuning support vector machines for biomedical named entity recognition, Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, p.1-8, July 11-11, 2002, Phildadelphia, Pennsylvania
|
|
|
|
|
|
Xiaodan Zhu , Mu Li , Jianfeng Gao , Chang-Ning Huang, Single character Chinese named entity recognition, Proceedings of the second SIGHAN workshop on Chinese language processing, p.125-132, July 11-12, 2003, Sapporo, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yunhua Hu , Hang Li , Yunbo Cao , Li Teng , Dmitriy Meyerzon , Qinghua Zheng, Automatic extraction of titles from general documents using machine learning, Information Processing and Management: an International Journal, v.42 n.5, p.1276-1293, September 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michel Galley , Kathleen McKeown , Julia Hirschberg , Elizabeth Shriberg, Identifying agreement and disagreement in conversational speech: use of Bayesian networks to model pragmatic dependencies, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.669-es, July 21-26, 2004, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Ryan McDonald , Fernando Pereira , Seth Kulick , Scott Winters , Yang Jin , Pete White, Simple algorithms for complex relation extraction with applications to biomedical IE, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.491-498, June 25-30, 2005, Ann Arbor, Michigan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoqiang Luo , Abe Ittycheriah , Hongyan Jing , Nanda Kambhatla , Salim Roukos, A mention-synchronous coreference resolution algorithm based on the Bell tree, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.135-es, July 21-26, 2004, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David A. Forsyth , Okan Arikan , Leslie Ikemoto , James O'Brien , Deva Ramanan, Computational studies of human motion: part 1, tracking and motion synthesis, Foundations and Trends® in Computer Graphics and Vision, v.1 n.2, p.77-254, July 2006
|
|
|
|
|
|
|
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mu Li , Yang Zhang , Muhua Zhu , Ming Zhou, Exploring distributional similarity based models for query spelling correction, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, p.1025-1032, July 17-18, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Vickrey , Luke Biewald , Marc Teyssier , Daphne Koller, Word-sense disambiguation for machine translation, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.771-778, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuya Unno , Takashi Ninomiya , Yusuke Miyao , Jun'ichi Tsujii, Trimming CFG parse trees for sentence compression using machine learning approaches, Proceedings of the COLING/ACL on Main conference poster sessions, p.850-857, July 17-18, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Zweig , O. Siohan , G. Saon , B. Ramabhadran , D. Povey , L. Mangu , B. Kingsbury, Automated quality monitoring for call centers using speech and NLP technologies, Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations, p.292-295, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
Bingjun Sun , Qingzhao Tan , Prasenjit Mitra , C. Lee Giles, Extraction and search of chemical formulae in text documents on the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Z. Kozareva , O. Ferrández , A. Montoyo , R. Muñoz , A. Suárez , J. Gómez, Combining data-driven systems for improving Named Entity Recognition, Data & Knowledge Engineering, v.61 n.3, p.449-466, June, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
José B. Mariòo , Rafael E. Banchs , Josep M. Crego , Adrià de Gispert , Patrik Lambert , José A. R. Fonollosa , Marta R. Costa-jussà, N-gram-based Machine Translation, Computational Linguistics, v.32 n.4, p.527-549, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y. Gao , J. Sorensen , H. Erdogan , R. Sarikaya , F. Liu , M. Picheny , B. Zhou , Z. Diao, A trainable approach for multi-lingual speech-to-speech translation system, Proceedings of the second international conference on Human Language Technology Research, p.231-234, March 24-27, 2002, San Diego, California
|
|
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wanxiang Che , Min Zhang , AiTi Aw , ChewLim Tan , Ting Liu , Sheng Li, Using a Hybrid Convolution Tree Kernel for Semantic Role Labeling, ACM Transactions on Asian Language Information Processing (TALIP), v.7 n.4, p.1-23, November 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew McCallum , Chris Pal , Greg Druck , Xuerui Wang, Multi-conditional learning: generative/discriminative training for clustering and classification, Proceedings of the 21st national conference on Artificial intelligence, p.433-439, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jon Patrick , Yitao Zhang , Yefeng Wang, Developing feature types for classifying clinical notes, Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, June 29-29, 2007, Prague, Czech Republic
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|