|
ABSTRACT
Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, D. Byrd, R. C. Swan, and J. Xu. Inquery does battle with trec-6. In TREC, pages 169--206, 1997.
|
| |
2
|
V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory, 42(6):2102--2117, 1996.
|
 |
3
|
C. L. A. Clarke , G. V. Cormack , M. Laszlo , T. R. Lynam , E. L. Terra, The impact of corpus size on question answering performance, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564448]
|
 |
4
|
|
 |
5
|
Susan Dumais , Michele Banko , Eric Brill , Jimmy Lin , Andrew Ng, Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564428]
|
| |
6
|
L. Grunfeld, K. L. Kwok, N. Dinstl, and P. Deng. Trec2003 robust, hard and qa track experiments using pircs. In The Twelfth Text REtrieval Conference (TREC 2003), 2004.
|
| |
7
|
|
| |
8
|
|
| |
9
|
K. L. Kwok, L. Grunfeld, H. L. Sun, and P. Deng. Trec 2004 robust track experiments using pircs. In The Twelfth Text REtrieval Conference (TREC 2004), 2005.
|
 |
10
|
|
| |
11
|
T. Mayer. Our blog is growing up -- and so has our index. http://www.ysearchblog.com/archives/000172.htm.
|
 |
12
|
|
| |
13
|
D. Metzler, F. Diaz, T. Strohman, and W. B. Croft. Umass at robust 2005: Using mixtures of relevance models for query expansion. In The Fourteenth Text REtrieval Conference (TREC 2005) Notebook, 2005.
|
| |
14
|
G. Mishne and M. de Rijke. Boosting web retrieval through query operations. In ECIR, pages 502--516, 2005.
|
| |
15
|
T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth International Colloquium on Cognitive Science, 1999. (invited paper).
|
| |
16
|
B. M. Shahshahani and D. A. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5):1087--1095, September 1994.
|
| |
17
|
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based serach engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.
|
| |
18
|
E. Voorhees. Overview of the trec 2004 robust track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004), 2004.
|
| |
19
|
E. Voorhees. Overview of the trec 2005 robust track. In Proceedings of the 14th Text REtrieval Conference (TREC 2005), 2005.
|
| |
20
|
S. Walker, S. E. Robertson, M. Boughanem, G. J. F. Jones, and K. S. Jones. Okapi at trec-6 automatic ad hoc, vlc, routing, filtering and qsdr. In TREC, pages 125--136, 1997.
|
 |
21
|
|
| |
22
|
D. L. Yeung, C. L. A. Clarke, G. V. Cormack, T. R. Lynam, and E. L. Terra. Task-specific query expansion. In The Twelfth Text REtrieval Conference (TREC 2003), 2004.
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Roelof van Zwol , Vanessa Murdock , Lluis Garcia Pueyo , Georgina Ramirez, Diversifying image search with user generated content, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Lance Riedel , Jeffrey Yuan, Online expansion of rare queries for sponsored search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|