ACM Home Page
Please provide us with feedback. Feedback
The maximum entropy approach and probabilistic IR models
Full text PdfPdf (246 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 18 ,  Issue 3  (July 2000) table of contents
Pages: 246 - 287  
Year of Publication: 2000
ISSN:1046-8188
Authors
Warren R. Greiff  Univ. of Massachusetts, Amherst
Jay M. Ponte  Univ. of Massachusetts, Amherst
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 94,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/352595.352597
What is a DOI?

ABSTRACT

This paper takes a fresh look at modeling approaches to information retrieval that have been the basis of much of the probabilistically motivated IR research over the last 20 years. We shall adopt a subjectivist Bayesian view of probabilities and argue that classical work on probabilistic retrieval is best understood from this perspective. The main focus of the paper will be the ranking formulas corresponding to the Binary Independence Model (BIM), presented originally by Roberston and Sparck Jones [1977] and the Combination Match Model (CMM), developed shortly thereafter by Croft and Harper [1979]. We will show how these same ranking formulas can result from a probabilistic methodology commonly known as Maximum Entropy (MAXENT).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BEEFERMAN, D., BERGER, A., AND LAFFERTY, J. 1997. Text segmentation using exponential models. In Proceedings of Empirical Methods in Natural Language Processing.
 
2
BRETTHORST, G. L. 1988. Excerpts from bayesian spectrum analysis and parameter estima-tion. In Maximum Entropy and Bayesian Methods in Science and Engineering,G.J. Erickson and C. R. Smith, Eds. Kluwer Academic Publishers, Norwell, MA, 75-146.
 
3
CHIANG, A. C. 1967. Fundamental Methods of Mathematical Economics. McGraw-Hill, New York.
 
4
COOPER, W. S. 1983. Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science 34, 1, 31-39.
5
 
6
COOPER,W.S.AND HUIZINGA, P. 1982. The maximum entropy principle and its application to the design of probabilistic retrieval systems. Information Technology, Research & Devel-opment 1, 99-112.
 
7
CROFT,W.B.AND HARPER, D. J. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (Dec.), 285-295.
 
8
DARROCH,J.AND RATCLIFF, D. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics 43, 1470-1480.
 
9
DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences, S. Kotz and N. L. Johnson, Eds. Vol. 7. Wiley, New York, 210-218.
 
10
DEGROOT,M.AND FEINBERG, S. 1982. The comparison and evaluation of forecasters. The Statistician 32, 12-22.
 
11
 
12
ERICKSON,G.J.AND SMITH, C. R. 1988. Maximum Entropy and Bayesian Methods in Science and Engineering. Kluwer Academic Publishers, Norwell, MA.
 
13
FINE, T. L. 1973. Theories of Probability: An Examination of Foundations. Academic Press,New York.
 
14
GOLAN, A., JUDGE,G.G.,AND MILLER, D. 1996. Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New York.
 
15
GOOD, I. J. 1950. Probability and the Weighing of Evidence. Charles Griffin, London.
 
16
GOOD, I. J. 1960. Weight of evidence, corroboration, explanatory power, information and the utility of experiments. Journal of the Royal Statistical Society:Series B. 22, 319-331.
 
17
GULL,S.F.AND DANIELL, G. J. 1978. Image reconstruction from incomplete and noisy data. Nature 272, 686-690.
 
18
HACKING, I. 1965. Logic of Statistical Inference. Cambridge University Press, Cambridge.
 
19
HARPER,D.J.AND VAN RIJSBERGEN, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 3 (Sept.), 189-216.
 
20
JAYNES, E. T. 1957a. Information theory and statistical mechanics: Part I. Physical Review 106, 620-630.
 
21
JAYNES, E. T. 1957b. Information theory and statistical mechanics: Part II. Physical Review 108, 171.
 
22
JAYNES, E. T. 1963. Information theory and statistical mechanics. In Statistical Physics: Brandeis Summer Institute Lectures in Theoretical Physics, G. E. Uhlenbeck, Ed. Brandeis Summer Institute Lectures in Theoretical Physics, vol. 3. W. A. Benjamin, New York, 182-218.
 
23
JAYNES, E. T. 1979. Where do we stand on maximum entropy. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 15-118.
 
24
JAYNES, E. T. 1994. Probability theory: The logic of science. Available via ftp://bayes. wustl.edu/pub/Jaynes/book.probability.theory/.
 
25
KANTOR, P. B. 1984. Maximum entropy and the optimal design of automated information retrieval systems. Information Technology, Research and Development 3, 2 (Apr.), 88-94.
26
 
27
 
28
LEE,J.J.AND KANTOR, P. B. 1991. A study of probabilistic information retrieval in the case of inconsistent expert judgments. Journal of the American Society for Information Science 42, 3, 166-172.
 
29
MARSHALL,K.T.AND OLIVER, R. M. 1995. Decision Making and Forecasting: with Emphasis on Model Building and Policy Analysis. McGraw-Hill, New York.
 
30
ROBERTSON, S. E. 1977. The probability ranking principle in IR. Journal of Documentation 33, 294-304.
 
31
ROBERTSON,S.E.AND SPARCK JONES, K. 1977. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-146.
 
32
SALTON, G., WONG, A., AND YU, C. T. 1976. Automatic indexing using term discrimination and term precision measurements. Information Processing and Management 12, 43-51.
 
33
SHANNON, C. E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379-423 & 623-656.
 
34
SMEATON,A.F.AND VAN RIJSBERGEN, C. J. 1983. The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal 25, 3, 239-246.
 
35
SPARCK JONES, K. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21.
 
36
TRIBUS, M. 1969. Rational Descriptions, Decisions, and Designs. Pergamon-Hall, New York.
 
37
TRIBUS, M. 1979. Thirty years of information theory. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 1-14.
 
38
VAN RIJSBERGEN, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106-119.
 
39

CITED BY  9


REVIEW

"Caroline Merriam Eastman : Reviewer"

This paper explores existing probabilistic models of information retrieval and their relationship to alternative probabilistic models based upon the principle of maximum entropy. It shows that the formulas for document matching and retrieval   more...

Collaborative Colleagues:
Warren R. Greiff: colleagues
Jay M. Ponte: colleagues