ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Improving the estimation of relevance models using large external corpora
Full text PdfPdf (266 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Relevance feedback table of contents
Pages: 154 - 161  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Fernando Diaz  University of Massachusetts, Amherst, MA
Donald Metzler  University of Massachusetts, Amherst, MA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 93,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148200
What is a DOI?

ABSTRACT

Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, D. Byrd, R. C. Swan, and J. Xu. Inquery does battle with trec-6. In TREC, pages 169--206, 1997.
 
2
V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory, 42(6):2102--2117, 1996.
3
4
5
 
6
L. Grunfeld, K. L. Kwok, N. Dinstl, and P. Deng. Trec2003 robust, hard and qa track experiments using pircs. In The Twelfth Text REtrieval Conference (TREC 2003), 2004.
 
7
 
8
 
9
K. L. Kwok, L. Grunfeld, H. L. Sun, and P. Deng. Trec 2004 robust track experiments using pircs. In The Twelfth Text REtrieval Conference (TREC 2004), 2005.
10
 
11
T. Mayer. Our blog is growing up -- and so has our index. http://www.ysearchblog.com/archives/000172.htm.
12
 
13
D. Metzler, F. Diaz, T. Strohman, and W. B. Croft. Umass at robust 2005: Using mixtures of relevance models for query expansion. In The Fourteenth Text REtrieval Conference (TREC 2005) Notebook, 2005.
 
14
G. Mishne and M. de Rijke. Boosting web retrieval through query operations. In ECIR, pages 502--516, 2005.
 
15
T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth International Colloquium on Cognitive Science, 1999. (invited paper).
 
16
B. M. Shahshahani and D. A. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5):1087--1095, September 1994.
 
17
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based serach engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.
 
18
E. Voorhees. Overview of the trec 2004 robust track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004), 2004.
 
19
E. Voorhees. Overview of the trec 2005 robust track. In Proceedings of the 14th Text REtrieval Conference (TREC 2005), 2005.
 
20
S. Walker, S. E. Robertson, M. Boughanem, G. J. F. Jones, and K. S. Jones. Okapi at trec-6 automatic ad hoc, vlc, routing, filtering and qsdr. In TREC, pages 125--136, 1997.
21
 
22
D. L. Yeung, C. L. A. Clarke, G. V. Cormack, T. R. Lynam, and E. L. Terra. Task-specific query expansion. In The Twelfth Text REtrieval Conference (TREC 2003), 2004.

CITED BY  15

Collaborative Colleagues:
Fernando Diaz: colleagues
Donald Metzler: colleagues