|
ABSTRACT
This paper takes a fresh look at modeling approaches to information retrieval that have been the basis of much of the probabilistically motivated IR research over the last 20 years. We shall adopt a subjectivist Bayesian view of probabilities and argue that classical work on probabilistic retrieval is best understood from this perspective. The main focus of the paper will be the ranking formulas corresponding to the Binary Independence Model (BIM), presented originally by Roberston and Sparck Jones [1977] and the Combination Match Model (CMM), developed shortly thereafter by Croft and Harper [1979]. We will show how these same ranking formulas can result from a probabilistic methodology commonly known as Maximum Entropy (MAXENT).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BEEFERMAN, D., BERGER, A., AND LAFFERTY, J. 1997. Text segmentation using exponential models. In Proceedings of Empirical Methods in Natural Language Processing.
|
| |
2
|
BRETTHORST, G. L. 1988. Excerpts from bayesian spectrum analysis and parameter estima-tion. In Maximum Entropy and Bayesian Methods in Science and Engineering,G.J. Erickson and C. R. Smith, Eds. Kluwer Academic Publishers, Norwell, MA, 75-146.
|
| |
3
|
CHIANG, A. C. 1967. Fundamental Methods of Mathematical Economics. McGraw-Hill, New York.
|
| |
4
|
COOPER, W. S. 1983. Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science 34, 1, 31-39.
|
 |
5
|
|
| |
6
|
COOPER,W.S.AND HUIZINGA, P. 1982. The maximum entropy principle and its application to the design of probabilistic retrieval systems. Information Technology, Research & Devel-opment 1, 99-112.
|
| |
7
|
CROFT,W.B.AND HARPER, D. J. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (Dec.), 285-295.
|
| |
8
|
DARROCH,J.AND RATCLIFF, D. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics 43, 1470-1480.
|
| |
9
|
DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences, S. Kotz and N. L. Johnson, Eds. Vol. 7. Wiley, New York, 210-218.
|
| |
10
|
DEGROOT,M.AND FEINBERG, S. 1982. The comparison and evaluation of forecasters. The Statistician 32, 12-22.
|
| |
11
|
|
| |
12
|
ERICKSON,G.J.AND SMITH, C. R. 1988. Maximum Entropy and Bayesian Methods in Science and Engineering. Kluwer Academic Publishers, Norwell, MA.
|
| |
13
|
FINE, T. L. 1973. Theories of Probability: An Examination of Foundations. Academic Press,New York.
|
| |
14
|
GOLAN, A., JUDGE,G.G.,AND MILLER, D. 1996. Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New York.
|
| |
15
|
GOOD, I. J. 1950. Probability and the Weighing of Evidence. Charles Griffin, London.
|
| |
16
|
GOOD, I. J. 1960. Weight of evidence, corroboration, explanatory power, information and the utility of experiments. Journal of the Royal Statistical Society:Series B. 22, 319-331.
|
| |
17
|
GULL,S.F.AND DANIELL, G. J. 1978. Image reconstruction from incomplete and noisy data. Nature 272, 686-690.
|
| |
18
|
HACKING, I. 1965. Logic of Statistical Inference. Cambridge University Press, Cambridge.
|
| |
19
|
HARPER,D.J.AND VAN RIJSBERGEN, C. J. 1978. An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 3 (Sept.), 189-216.
|
| |
20
|
JAYNES, E. T. 1957a. Information theory and statistical mechanics: Part I. Physical Review 106, 620-630.
|
| |
21
|
JAYNES, E. T. 1957b. Information theory and statistical mechanics: Part II. Physical Review 108, 171.
|
| |
22
|
JAYNES, E. T. 1963. Information theory and statistical mechanics. In Statistical Physics: Brandeis Summer Institute Lectures in Theoretical Physics, G. E. Uhlenbeck, Ed. Brandeis Summer Institute Lectures in Theoretical Physics, vol. 3. W. A. Benjamin, New York, 182-218.
|
| |
23
|
JAYNES, E. T. 1979. Where do we stand on maximum entropy. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 15-118.
|
| |
24
|
JAYNES, E. T. 1994. Probability theory: The logic of science. Available via ftp://bayes. wustl.edu/pub/Jaynes/book.probability.theory/.
|
| |
25
|
KANTOR, P. B. 1984. Maximum entropy and the optimal design of automated information retrieval systems. Information Technology, Research and Development 3, 2 (Apr.), 88-94.
|
 |
26
|
|
| |
27
|
|
| |
28
|
LEE,J.J.AND KANTOR, P. B. 1991. A study of probabilistic information retrieval in the case of inconsistent expert judgments. Journal of the American Society for Information Science 42, 3, 166-172.
|
| |
29
|
MARSHALL,K.T.AND OLIVER, R. M. 1995. Decision Making and Forecasting: with Emphasis on Model Building and Policy Analysis. McGraw-Hill, New York.
|
| |
30
|
ROBERTSON, S. E. 1977. The probability ranking principle in IR. Journal of Documentation 33, 294-304.
|
| |
31
|
ROBERTSON,S.E.AND SPARCK JONES, K. 1977. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-146.
|
| |
32
|
SALTON, G., WONG, A., AND YU, C. T. 1976. Automatic indexing using term discrimination and term precision measurements. Information Processing and Management 12, 43-51.
|
| |
33
|
SHANNON, C. E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379-423 & 623-656.
|
| |
34
|
SMEATON,A.F.AND VAN RIJSBERGEN, C. J. 1983. The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal 25, 3, 239-246.
|
| |
35
|
SPARCK JONES, K. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21.
|
| |
36
|
TRIBUS, M. 1969. Rational Descriptions, Decisions, and Designs. Pergamon-Hall, New York.
|
| |
37
|
TRIBUS, M. 1979. Thirty years of information theory. In The Maximum Entropy Formalism, R. D. Levine and M. Tribus, Eds. MIT Press, Cambridge, Massachusetts, 1-14.
|
| |
38
|
VAN RIJSBERGEN, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106-119.
|
| |
39
|
|
CITED BY 9
|
|
|
|
|
Robert W.P. Luk , H. V. Leong , Tharam S. Dillon , Alvin T.S. Chan , W. Bruce Croft , James Allan, A survey in indexing and searching XML documents, Journal of the American Society for Information Science and Technology, v.53 n.6, p.415-437, May, 2002
|
|
|
Al Mamunur Rashid , Istvan Albert , Dan Cosley , Shyong K. Lam , Sean M. McNee , Joseph A. Konstan , John Riedl, Getting to know you: learning new user preferences in recommender systems, Proceedings of the 7th international conference on Intelligent user interfaces, January 13-16, 2002, San Francisco, California, USA
|
|
|
V. Markl , N. Megiddo , M. Kutsch , T. M. Tran , P. Haas , U. Srivastava, Consistently estimating the selectivity of conjuncts of predicates, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
Eirinaios Michelakis , Rajasekar Krishnamurthy , Peter J. Haas , Shivakumar Vaithyanathan, Uncertainty management in rule-based information extraction systems, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
V. Markl , P. J. Haas , M. Kutsch , N. Megiddo , U. Srivastava , T. M. Tran, Consistent selectivity estimation via maximum entropy, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.55-76, January 2007
|
REVIEW
"Caroline Merriam Eastman : Reviewer"
This paper explores existing probabilistic models of
information retrieval and their relationship to alternative
probabilistic models based upon the principle of maximum entropy.
It shows that the formulas for document matching and retrieval
more...
|