ACM Home Page
Please provide us with feedback. Feedback
Relevance models to help estimate document and query parameters
Full text PdfPdf (260 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 22 ,  Issue 3  (July 2004) table of contents
Pages: 357 - 380  
Year of Publication: 2004
ISSN:1046-8188
Author
David Bodoff  Hong Kong University of Science and Technology, Hong Kong
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 58,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1010614.1010615
What is a DOI?

ABSTRACT

A central idea of Language Models is that documents (and perhaps queries) are random variables, generated by data-generating functions that are characterized by document (query) parameters. The key new idea of this paper is to model that a relevance judgment is also generated stochastically, and that its data generating function is also governed by those same document and query parameters. The result of this addition is that any available relevance judgments are easily incorporated as additional evidence about the true document and query model parameters. An additional aspect of this approach is that it also resolves the long-standing problem of document-oriented versus query-oriented probabilities. The general approach can be used with a wide variety of hypothesized distributions for documents, queries, and relevance. We test the approach on Reuters Corpus Volume 1, using one set of possible distributions. Experimental results show that the approach does succeed in incorporating relevance data to improve estimates of both document and query parameters, but on this data and for the specific distributions we hypothesized, performance was no better than two separate one-sided models. We conclude that the model's theoretical contribution is its integration of relevance models, document models, and query models, and that the potential for additional performance improvement over one-sided methods requires refinements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bodoff, D., Wu, B., and Wong, Ky. 2003. Relevance data for language models using maximum likelihood. J. Amer. Soc. Info. Sci. Tech. 54, 11, 1050--1061.
 
3
Brauen, T. 1971. Document Vector Modification. In The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton, Ed. Prentice Hall, Engelwood Cliffs, NJ, 456--484.
 
4
Cooper, W. S., Chen, A., and Gey, F. C. 1995. Experiments in the probabilistic retrieval of full text documents. In Proceedings of TREC3, Donna K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD.
5
 
6
Fuhr, N., Pfeifer, U., Bremkamp, C., and Pollmann, M. 1994. Probabilistic learning approaches for indexing and retrieval with the TREC-2 collection. In Proceedings of TREC3, Donna K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD.
 
7
Hansen, L. K. and Larsen, J. 1996. Linear unlearning for cross-validation. Advances in Computational Mathematics 5, 269--280.
8
9
 
10
Nie, J. Y. 1989. An Outline of a general model for information retrieval. Information Processing and Management 25, 5, 477--491.
11
 
12
 
13
Robertson, S. E. 1994. Query-document symmetry and dual models. J. Documentation 50, 3, 233--238.
 
14
Robertson, S. E. and Sparck Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 3, 129--146.
 
15
Robertson, S. E. Maron, M. E., and Cooper, W. S. 1982. Probability of relevance: a unification of two competing models for document retrieval. Information Technology---Research and Development 1, 1--21.
16
17
 
18
19



REVIEW

"Donald Harris Kraft : Reviewer"

This interesting paper makes for a good read. Bodoff extends the paradigm of probabilistic retrieval modeling, which has focused, in the past, on query or document models, namely, on language models or models of words. The author adds relevance, w  more...


Peer to Peer - Readers of this Article have also read: