ACM Home Page
Please provide us with feedback. Feedback
Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model
Full text PdfPdf (199 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Retreval models table of contents
Pages: 18 - 25  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Jaime Teevan  MIT AI Lab, Cambridge, MA
David R. Karger  MIT LCS, Cambridge, MA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 53,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860441
What is a DOI?

ABSTRACT

Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular retrieval algorithm. We explore the explicit assumptions underlying the naïve framework by performing computational analysis of actual corpora and queries to devise a generative document model that closely matches text. Our thesis is that a model so developed will be more accurate than existing models, and thus more useful in retrieval, as well as other applications. We test this by learning from a corpus the best document model. We find the learned model better predicts the existence of text data and has improved performance on certain IR tasks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley & Sons, 1994.
 
2
K. Church and W. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.
 
3
S. Eyheramendy, D. D. Lewis, and D. Madigan. On the naive Bayes model for text classification. In Artificial Intelligence & Statistics, 2003.
4
 
5
 
6
A. T. Gous. Adaptive estimation of distributions using exponential sub-families. Journal of Computational and Graphical Statistics, 7(3):388--396, 1998.
7
 
8
A. Griffith, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. JASIS, 37:3--11, 1986.
 
9
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR_TR-95-06, Microsoft Research, 1995. Revised 1996.
10
 
11
K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.
 
12
K. S. Jones, S. Walker, and S. Robertson. A probabilistic model of information retrieval: development and status. Technical Report TR-446, Cambridge University Computer Laboratory, 1998.
 
13
 
14
15
 
16
 
17
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Developement, 1(4):309--317, 1957.
 
18
 
19
K. Ng. A maximum likelihood ratio information retrieval model. In TREC-8, 1999.
 
20
21
 
22
 
23
24

CITED BY  8

Collaborative Colleagues:
Jaime Teevan: colleagues
David R. Karger: colleagues