|
ABSTRACT
Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. A. Evans and R. G. Lefferts. Design and evaluation of the clarit-trec-2 system. In D. Harman, editor, Proceedings of the Second Text REtrieval Conference (TREC-2), 1994.
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
6
|
|
 |
7
|
|
| |
8
|
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1996.
|
 |
9
|
|
| |
10
|
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
|
| |
11
|
J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall Inc., 1971.
|
| |
12
|
J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall Inc., 1971.
|
 |
13
|
|
| |
14
|
T. Tao and C. Zhai. Mixture clustering model for pseudo. In Proceedings of the 2004 Meeting of the International Federation of Classification Societies. Spriner, 2003.
|
| |
15
|
E. Voorhees and D. Harman, editors. Proceedings of Text REtrieval Conference (TREC1-9). NIST Special Publications, 2001. http://trec.nist.gov/pubs.html.
|
 |
16
|
|
 |
17
|
|
| |
18
|
C. Zhai and J. Lafferty. Model-based feedback in the KL -divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001.
|
 |
19
|
|
CITED BY 20
|
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , De-Sheng Wang , Wen-Ying Xiong , Hang Li, Learning to rank relational objects and its application to web search, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
Qiaozhu Mei , Xu Ling , Matthew Wondra , Hang Su , ChengXiang Zhai, Topic sentiment mixture: modeling facets and opinions in weblogs, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Julien Ah-Pine , Marco Bressan , Stephane Clinchant , Gabriela Csurka , Yves Hoppenot , Jean-Michel Renders, Crossing textual and visual content in different application scenarios, Multimedia Tools and Applications, v.42 n.1, p.31-56, March 2009
|
|
|
Hao Lang , Bin Wang , Gareth Jones , Jin-Tao Li , Fan Ding , Yi-Xuan Liu, Query performance prediction for information retrieval based on covering topic score, Journal of Computer Science and Technology, v.23 n.4, p.590-601, July 2008
|
|
|
|
|
|
|
|
|
|
|