|
ABSTRACT
We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling approach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk minimization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC collections and compared to the basic language modeling approach and vector space models together with query expansion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A.Bookstein and D.Swanson.A decision theoretic foundation for indexing.Journal for the American Society for Information Science pages 45 -50,1975.
|
| |
3
|
A.Bookstein and D.Swanson.Probabilistic models for automatic indexing.Journal for the American Society for Information Science 25(5):312 -318,1976.
|
| |
4
|
|
| |
5
|
H.Hubbell C.yAn input-output approach to clique identification.Sociometry 28:377 -399,1965.
|
| |
6
|
J.G.Carbonell,Y.Geng,and J.Goldstein.Automated query-relevant summarization and diversity-based reranking. In IJCAI-97 Workshop on AI and Digital Libraries 1997.
|
 |
7
|
|
| |
8
|
W.B.Croft and D.J.Harper.Using probabilistic models of document retrieval without relevance information.Journal of Documentation 35:285 -295,1979.
|
| |
9
|
S.Deerwester,S.Dumais,T.Landauer,G.Furnas,and R.Harshman.Indexing by latent semantic analysis.Journal of American Society for Information Science 41:391 -407, 1990.
|
| |
10
|
|
| |
11
|
D.Hiemstra and W.Kraaij.Twenty-one at TREC-7:Ad-hoc and cross-language track.In Proc. of Seventh Text REtrieval Conference (TREC-7),1998.
|
| |
12
|
L.Katz.A new status index derived from sociometric analysis.Psychometrika 18:39 -43,1953.
|
 |
13
|
|
| |
14
|
J.La .erty and C.Zhai.Probabilistic IR models based on query and document generation.In Proceedings of the Workshop on Language Modeling and Information Retrieval Carnegie Mellon University,May 31 -June 1,2001.
|
 |
15
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
16
|
F.Mosteller and D.Wallace.Inference and disputed authorship: The Federalist Addison Wesley,1964.
|
 |
17
|
|
| |
18
|
S.Robertson and K.Sparck Jones.Relevance weighting of search terms.Journal of the American Society for Information Science 27:129 -146,1976.
|
| |
19
|
S.E.Robertson,S.Walker,S.Jones,M.M.Hancock- Beaulieu,andM.Gatford.OkapiatTREC-3.InD.K.Harman,editor,The Third Text REtrieval Conference (TREC- 3) ,1995.
|
| |
20
|
|
| |
21
|
|
CITED BY 128
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kristina Toutanova , Christopher D. Manning , Andrew Y. Ng, Learning random walk models for inducing word dependency distributions, Proceedings of the twenty-first international conference on Machine learning, p.103, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Iyengar , P. Duygulu , S. Feng , P. Ircing , S. P. Khudanpur , D. Klakow , M. R. Krause , R. Manmatha , H. J. Nock , D. Petkova , B. Pytlik , P. Virga, Joint visual-text modeling for automatic retrieval of multimedia documents, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
Jing Bai , Dawei Song , Peter Bruza , Jian-Yun Nie , Guihong Cao, Query expansion using term relationships in language models for information retrieval, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alexandar Jaffe , Mor Naaman , Tamir Tassa , Marc Davis, Generating summaries and visualization for large collections of geo-referenced photographs, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
|
|
|
Xiaohua Zhou , Xiaohua Hu , Xiaodan Zhang , Xia Lin , Il-Yeol Song, Context-sensitive semantic smoothing for the language modeling approach to genomic IR, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Yunbo Cao , Jun Xu , Tie-Yan Liu , Hang Li , Yalou Huang , Hsiao-Wuen Hon, Adapting ranking SVM to document retrieval, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Keke Cai , Chun Chen , Jiajun Bu , Peng Huang , Zhiming Kang, Exploration of query context for information retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Zhang Jun-lin , Sun Le , Qu Wei-min , Sun Yu-fang, A trigger language model-based IR system, Proceedings of the 20th international conference on Computational Linguistics, p.680-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
|
|
|
Chin-Yew Lin , Guihong Cao , Jianfeng Gao , Jian-Yun Nie, An information-theoretic approach to automatic evaluation of summaries, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.463-470, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiubo Geng , Tie-Yan Liu , Tao Qin , Andrew Arnold , Hang Li , Heung-Yeung Shum, Query dependent ranking using K-nearest neighbor, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gao Cong , Long Wang , Chin-Yew Lin , Young-In Song , Yueheng Sun, Finding question-answer pairs from online forums, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jun Xu , Tie-Yan Liu , Min Lu , Hang Li , Wei-Ying Ma, Directly optimizing evaluation measures in learning to rank, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ding Zhou , Jiang Bian , Shuyi Zheng , Hongyuan Zha , C. Lee Giles, Exploring social annotations for information retrieval, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jianhan Zhu , Jun Wang , Ingemar J. Cox , Michael J. Taylor, Risky business: modeling and exploiting uncertainty in information retrieval, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|