ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Probabilistic models for document retrieval: a comparison of perfromance on exterimental and synthetic data bases
Full text PdfPdf (722 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Palazzo dei Congressi, Pisa, Italy
Pages: 258 - 264  
Year of Publication: 1986
ISBN:0-89791-187-3
Authors
Robert Losee  U. of North Carolina, Chapel Hill, NC
Abraham Bookstein  U. of Chicago, Chicago, IL
Clement T. Yu  U. of Illinois, Chicago, IL
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 23,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/253168.253222
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Probabilistic document retrieval systems consistent with the two Poisson independence model outperforms the binary independence model if the terms are distributed as described by the model's assumptions. The Two Poisson Effectiveness Hypothesis suggests that retrieval models based upon the two Poisson model will outperform binary independent models when used on a “real-world” database, where independence and two Poisson term occurrence distributions fail to hold, because the added information obtained from incorporating term frequency information will more than compensate for the non-Poisson distributions of terms. Searches of the MED1033 database suggest that if terms are not independent and frequencies of term occurrence are not distributed in a two Poisson manner, the binary independence sequential retrieval model outperforms the two Poisson independence retrieval model.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bookstein, A. and Swanson, D. "A Decision Theoretic Foundation for Indexing." Journal of the American Society for Information Science. XXVI (January 1975): 45-50.
 
2
Booketein# A. "Information Retrieval: A Seguential Learning Process." Journal of the American Society for Information Science. XXXIV (September 1983): 331-342.
 
3
 
4
Croft, W. and Harper, D. "Using Probabillstlc Models of Document Retrieval without Relevance Information." Journal of Documentation~ XXXV (December 1979): 285-295.
 
5
FOx, E. Characterization of Two New Experimental Collections in Computerand Information Science Containing Textual and Bibliographic #. Technical Report 83-561, Cornell Eniverslty Department of Computer Science. Ithaca, New York: September, 1983.
 
6
Hatter, S. "A Probabilistic Approach to Keyword Indexing." Ph.D. dissertation, University of Chicago, 1974.
 
7
Losee, R. "The Performance of Probabillstic Models of Document Retrieval Systems." Ph.D. dissertation, University of Chicago, 1986.
8
 
9
D11man, J. Princlples of Database Systems, second edition. (Rockville, Maryland: Computer Science Press, 1982).
 
10
 
11
Voorhees, E. Computer Science Department, Cornell University, Ithaca, New YOrk. Letter of 18 June, 1984 and persona# co~#aunication of 19 June, 1985.

Collaborative Colleagues:
Robert Losee: colleagues
Abraham Bookstein: colleagues
Clement T. Yu: colleagues