| A framework for determining necessary query set sizes to evaluate web search effectiveness |
| Full text |
Pdf
(55 KB)
|
| Source
|
International World Wide Web Conference
archive
Special interest tracks and posters of the 14th international conference on World Wide Web
table of contents
Chiba, Japan
POSTER SESSION: Posters
table of contents
Pages: 1176 - 1177
Year of Publication: 2005
ISBN:1-59593-051-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 27, Citation Count: 4
|
|
|
ABSTRACT
We describe a framework of bootstrapped hypothesis testing for estimating the confidence in one web search engine outperforming another over any randomly sampled query set of a given size. To validate this framework, we have constructed and made available a precision-oriented test collection consisting of manual binary relevance judgments for each of the top ten results of ten web search engines across 896 queries and the single best result for each of those queries. Results from this bootstrapping approach over typical query set sizes indicate that examining repeated statistical tests is imperative, as a single test is quite likely to find significant differences that do not necessarily generalize. We also find that the number of queries needed for a repeatable evaluation in a dynamic environment such as the web is much higher than previously studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009048]
|
 |
2
|
|
| |
3
|
|
| |
4
|
Efron, B. and R.J. Tibshirani, An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability. 1993.
|
CITED BY 4
|
|
Eric C. Jensen , Steven M. Beitzel , David Grossman , Ophir Frieder , Abdur Chowdhury, Predicting query difficulty on the web by learning visual clues, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
Steven M. Beitzel , Eric C. Jensen , Ophir Frieder , Abdur Chowdhury , Greg Pass, Surrogate scoring for improved metasearch precision, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|