ACM Home Page
Please provide us with feedback. Feedback
Query dependent pseudo-relevance feedback based on wikipedia
Full text PdfPdf (478 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Expansion and feedback table of contents
Pages 59-66  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Yang Xu  Institute of Computing,Chinese Academy of Sciences, Beijing, China
Gareth J.F. Jones  Dublin City University, Dublin, Ireland
Bin Wang  Institute of Computing,Chinese Academy of Sciences, Beijing, China
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 128,   Downloads (12 Months): 372,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571954
What is a DOI?

ABSTRACT

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Text REtrieval Conference, 1994.
3
 
4
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
5
6
7
 
8
C. Fautsch and J. Savoy. UniNE at TREC 2008: Fact and Opinion Retrieval in the Blogsphere. In Proceedings of TREC 2008.
 
9
10
 
11
G.R. Giambattista Amati, Claudio Carpineto and F.U. Bordoni. Query di±culty, robustness and selective application of query expansion. In Proceedings of ECIR 2004, pages 127--137, 2004.
 
12
 
13
 
14
Indri. http://www.lemurproject.org/indri/.
 
15
 
16
W. W.-J. H.K. Balog, E. Meij and M. de Rijke. The University of Amsterdam at TREC 2008: Blog, Enterprise, and Relevance Feedback. In Proceedings of TREC 2008.
17
18
19
20
21
 
22
D. Metzler, T. Strohman, H. Turtle, and W. Croft. Indri at trec 2005: Terabyte track. In Proceedings of TREC 2004.
23
 
24
G. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, Amsterdam, 2007.
 
25
J. Platt. Probabilities for SV machines. Advances in large margin classifiers, pages 61--74.
26
 
27
S.E. Robertson, S. Walker, M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In In Proceedings of the 4th Text REtrieval Conference (TREC), 1996.
28
29
30
31
 
32
M.M. Zesch Torsten, Gurevych Iryna. Analyzing and accessing Wikipedia as a lexical semantic resource. In Biannual Conference of the Society for Computational Linguistics and Language Technology 2007, pages 213--221.
33
 
34
W. Zhang and C. Yu. UIC at TREC 2006 Blog Track. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2007.

Collaborative Colleagues:
Yang Xu: colleagues
Gareth J.F. Jones: colleagues
Bin Wang: colleagues