|
ABSTRACT
A central idea of Language Models is that documents (and perhaps queries) are random variables, generated by data-generating functions that are characterized by document (query) parameters. The key new idea of this paper is to model that a relevance judgment is also generated stochastically, and that its data generating function is also governed by those same document and query parameters. The result of this addition is that any available relevance judgments are easily incorporated as additional evidence about the true document and query model parameters. An additional aspect of this approach is that it also resolves the long-standing problem of document-oriented versus query-oriented probabilities. The general approach can be used with a wide variety of hypothesized distributions for documents, queries, and relevance. We test the approach on Reuters Corpus Volume 1, using one set of possible distributions. Experimental results show that the approach does succeed in incorporating relevance data to improve estimates of both document and query parameters, but on this data and for the specific distributions we hypothesized, performance was no better than two separate one-sided models. We conclude that the model's theoretical contribution is its integration of relevance models, document models, and query models, and that the potential for additional performance improvement over one-sided methods requires refinements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bodoff, D., Wu, B., and Wong, Ky. 2003. Relevance data for language models using maximum likelihood. J. Amer. Soc. Info. Sci. Tech. 54, 11, 1050--1061.
|
| |
3
|
Brauen, T. 1971. Document Vector Modification. In The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton, Ed. Prentice Hall, Engelwood Cliffs, NJ, 456--484.
|
| |
4
|
Cooper, W. S., Chen, A., and Gey, F. C. 1995. Experiments in the probabilistic retrieval of full text documents. In Proceedings of TREC3, Donna K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD.
|
 |
5
|
|
| |
6
|
Fuhr, N., Pfeifer, U., Bremkamp, C., and Pollmann, M. 1994. Probabilistic learning approaches for indexing and retrieval with the TREC-2 collection. In Proceedings of TREC3, Donna K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD.
|
| |
7
|
Hansen, L. K. and Larsen, J. 1996. Linear unlearning for cross-validation. Advances in Computational Mathematics 5, 269--280.
|
 |
8
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
9
|
|
| |
10
|
Nie, J. Y. 1989. An Outline of a general model for information retrieval. Information Processing and Management 25, 5, 477--491.
|
 |
11
|
|
| |
12
|
|
| |
13
|
Robertson, S. E. 1994. Query-document symmetry and dual models. J. Documentation 50, 3, 233--238.
|
| |
14
|
Robertson, S. E. and Sparck Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 3, 129--146.
|
| |
15
|
Robertson, S. E. Maron, M. E., and Cooper, W. S. 1982. Probability of relevance: a unification of two competing models for document retrieval. Information Technology---Research and Development 1, 1--21.
|
 |
16
|
Hinrich Schütze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215365]
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
REVIEW
"Donald Harris Kraft : Reviewer"
This interesting paper makes for a good read. Bodoff extends the paradigm of probabilistic retrieval modeling, which has focused, in the past, on query or document models, namely, on language models or models of words. The author adds relevance, w
more...
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|