ACM Home Page
Please provide us with feedback. Feedback
Evaluating sampling methods for uncooperative collections
Full text PdfPdf (376 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Collection representation in distributed IR table of contents
Pages: 503 - 510  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Paul Thomas  Australian National University, Canberra, Australia
David Hawking  CSIRO ICT Centre, Canberra, Australia
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 77,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277828
What is a DOI?

ABSTRACT

Many server selection methods suitable for distributed information retrieval applications rely, in the absence of cooperation, on the availability of unbiased samples of documents from the constituent collections. We describe a number of sampling methods which depend only on the normal query-response mechanism of the applicable search facilities. We evaluate these methods on a number of collections typical of a personal metasearch application. Results demonstrate that biases exist for all methods, particularly toward longer documents, and that in some cases these biases can be reduced but not eliminated by choice of parameters.We also introduce a new sampling technique, "multiple queries", which produces samples of similar quality to the best current techniques but with significantly reduced cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
 
6
N. Craswell, D. Hawking, and P. Thistlewaite. Merging results from isolated search engines. In Proc. Australasian Database Conference, 1999.
7
 
8
9
 
10
Open directory project. http://dmoz.org/.
11
 
12
P. Rusmevichientong, D. M. Pennock, S. Lawrence, and C. L. Giles. Methods for sampling pages uniformly from the world wide web. In Proc. AAAI Fall Symposium on Using Uncertainty Within Computation, 2001.
13
14
15


Collaborative Colleagues:
Paul Thomas: colleagues
David Hawking: colleagues