ACM Home Page
Please provide us with feedback. Feedback
Document language models, query models, and risk minimization for information retrieval
Full text PdfPdf (240 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 111 - 119  
Year of Publication: 2001
ISBN:1-58113-331-6
Authors
John Lafferty  Carnegie Mellon Univ., Pittsburgh, PA
Chengxiang Zhai  Carnegie Mellon Univ., Pittsburgh, PA
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 277,   Citation Count: 128
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.383970
What is a DOI?

ABSTRACT

We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling approach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk minimization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC collections and compared to the basic language modeling approach and vector space models together with query expansion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A.Bookstein and D.Swanson.A decision theoretic foundation for indexing.Journal for the American Society for Information Science pages 45 -50,1975.
 
3
A.Bookstein and D.Swanson.Probabilistic models for automatic indexing.Journal for the American Society for Information Science 25(5):312 -318,1976.
 
4
 
5
H.Hubbell C.yAn input-output approach to clique identification.Sociometry 28:377 -399,1965.
 
6
J.G.Carbonell,Y.Geng,and J.Goldstein.Automated query-relevant summarization and diversity-based reranking. In IJCAI-97 Workshop on AI and Digital Libraries 1997.
7
 
8
W.B.Croft and D.J.Harper.Using probabilistic models of document retrieval without relevance information.Journal of Documentation 35:285 -295,1979.
 
9
S.Deerwester,S.Dumais,T.Landauer,G.Furnas,and R.Harshman.Indexing by latent semantic analysis.Journal of American Society for Information Science 41:391 -407, 1990.
 
10
 
11
D.Hiemstra and W.Kraaij.Twenty-one at TREC-7:Ad-hoc and cross-language track.In Proc. of Seventh Text REtrieval Conference (TREC-7),1998.
 
12
L.Katz.A new status index derived from sociometric analysis.Psychometrika 18:39 -43,1953.
13
 
14
J.La .erty and C.Zhai.Probabilistic IR models based on query and document generation.In Proceedings of the Workshop on Language Modeling and Information Retrieval Carnegie Mellon University,May 31 -June 1,2001.
15
 
16
F.Mosteller and D.Wallace.Inference and disputed authorship: The Federalist Addison Wesley,1964.
17
 
18
S.Robertson and K.Sparck Jones.Relevance weighting of search terms.Journal of the American Society for Information Science 27:129 -146,1976.
 
19
S.E.Robertson,S.Walker,S.Jones,M.M.Hancock- Beaulieu,andM.Gatford.OkapiatTREC-3.InD.K.Harman,editor,The Third Text REtrieval Conference (TREC- 3) ,1995.
 
20
 
21

CITED BY  128

Collaborative Colleagues:
John Lafferty: colleagues
Chengxiang Zhai: colleagues