|
ABSTRACT
The goal of a probabilistic retrieval system design is to rank the elements of the search universe in descending order of their estimated probability of usefulness to the user. Previously explored methods for computing such a ranking have involved the use of statistical independence assumptions and multiple regression analysis on a learning sample. In this paper these techniques are recombined in a new way to achieve greater accuracy of probabilistic estimate without undue additional computational complexity. The novel element of the proposed design is that the regression analysis be carried out in two or more levels or stages. Such an approach allows composite or grouped retrieval clues to be analyzed in an orderly manner -- first within groups, and then between. It compensates automatically for systematic biases introduced by the statistical simplifying assumptions, and gives rise to search algorithms of reasonable computational efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bookstein, A. Probability and fuzzy set applications to information retrieval. In M. Williams (ed.), Annual Review of Information Science and Technology, 20, White Plains, NY: Knowledge Industry Publications. 1985.
|
 |
2
|
|
| |
3
|
|
| |
4
|
Collett, D. Modelling Binary Data. London: Chapman & Hall; 1991.
|
| |
5
|
Cooper, W. S. The Inadequacy of Probability of Usefulness as a Ranking Criterion for Retrieval System Output. Xeroxed report, School of Library and Information Studies, University of California, Berkeley, CA 94720, 1973.
|
 |
6
|
|
| |
7
|
Cooper, W. S. Exploiting the maximum entropy principle to increase retrieval effectiveness. Journal of the American Society for Information Science, 34(1): 31-39; 1983.
|
| |
8
|
Cooper, W. S. Probability theory as the basis of text retrieval. Proceedings of the 54th Annual Meeting of the American Society for Information Science, vol. 28. Washington, D. C: 366-369. October 1991a.
|
 |
9
|
|
| |
10
|
Cooper, W. S.; Huizinga, P. The maximum entropy principle and its application to the design of probabilistic retrieval systems. Information Technology: Research and Development, 1(2): 99-112; 1982.
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
Gordon, M. D.; Lenk, P. A utility theoretic examination of the probability ranking principle in information retrieval. Journal of the American Society for Information Science 42(10): 703-714; 1991.
|
| |
16
|
Harman, D. User-Friendly Systems Instead of User-Friendly Front-Ends. Journal of the American Society for Information Science, 43(2): 164-174; 1992.
|
| |
17
|
Harper, D. J.; Van Rijsbergen, C. J. An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation, 34(3): 189-216; 1978.
|
| |
18
|
Hosmer, D. W.; Lemeshow, S. Applied Logistic Regression. New York" Wiley; 1989.
|
| |
19
|
Kantor, P. Maximum entropy and the optimal design of automated information retrieval systems. Information Technology: Research and Development, 3(2): 88-94; 1984.
|
| |
20
|
Lee, J. J.; Kantor, P. A study of probabilistic information retrieval systems in the case of inconsistent expert judgement. Journal of the American Society for Information Science, 42(3), 1990.
|
| |
21
|
Maron, M. E. Probabilistic Retrieval Models. In B. Dervin and M. Voigt (Eds.), Progress in Communication Sciences, Vol. V, Ablex, 1984, pp. 145-176.
|
 |
22
|
|
| |
23
|
Robertson, S. E. The probability ranking principle in IR. Journal of Documentation: 33, 294-304; 1977.
|
| |
24
|
Robertson, S. E; Bovey, J. D. Statistical problems in the application of probabilistic models to information retrieval. British Library Research and Development Department, Report No. 5739, November 1982.
|
| |
25
|
Robertson, S. E.; Sparck Jones, K. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3): 129-146; 1976.
|
| |
26
|
|
| |
27
|
van Rijsbergen, C. J. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 33(2): 106-119; 1977.
|
| |
28
|
|
| |
29
|
Yu, C.T.; Buckley, C.; Lam, H.; Salton, G. A generalized term dependence model in information retrieval. Information Technology: Research and Development, 2: 129-154; 1983.
|
 |
30
|
|
CITED BY 22
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aitao Chen, A comparison of regression, neural net, and pattern recognition approaches to IR, Proceedings of the seventh international conference on Information and knowledge management, p.140-147, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|