| Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model |
| Full text |
Pdf
(199 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
SESSION: Retreval models
table of contents
Pages: 18 - 25
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 53, Citation Count: 8
|
|
|
ABSTRACT
Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular retrieval algorithm. We explore the explicit assumptions underlying the naïve framework by performing computational analysis of actual corpora and queries to devise a generative document model that closely matches text. Our thesis is that a model so developed will be more accurate than existing models, and thus more useful in retrieval, as well as other applications. We test this by learning from a corpus the best document model. We find the learned model better predicts the existence of text data and has improved performance on certain IR tasks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley & Sons, 1994.
|
| |
2
|
K. Church and W. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.
|
| |
3
|
S. Eyheramendy, D. D. Lewis, and D. Madigan. On the naive Bayes model for text classification. In Artificial Intelligence & Statistics, 2003.
|
 |
4
|
|
| |
5
|
|
| |
6
|
A. T. Gous. Adaptive estimation of distributions using exponential sub-families. Journal of Computational and Graphical Statistics, 7(3):388--396, 1998.
|
 |
7
|
|
| |
8
|
A. Griffith, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. JASIS, 37:3--11, 1986.
|
| |
9
|
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR_TR-95-06, Microsoft Research, 1995. Revised 1996.
|
 |
10
|
|
| |
11
|
K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.
|
| |
12
|
K. S. Jones, S. Walker, and S. Robertson. A probabilistic model of information retrieval: development and status. Technical Report TR-446, Cambridge University Computer Laboratory, 1998.
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Developement, 1(4):309--317, 1957.
|
| |
18
|
Kathleen R. McKeown , Judith L. Klavans , Vasileios Hatzivassiloglou , Regina Barzilay , Eleazar Eskin, Towards multidocument summarization by reformulation: progress and prospects, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.453-460, July 18-22, 1999, Orlando, Florida, United States
|
| |
19
|
K. Ng. A maximum likelihood ratio information retrieval model. In TREC-8, 1999.
|
| |
20
|
Kamal Nigam , Andrew McCallum , Sebastian Thrun , Tom Mitchell, Learning to classify text from labeled and unlabeled documents, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.792-799, July 1998, Madison, Wisconsin, United States
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
|