ACM Home Page
Please provide us with feedback. Feedback
Robust result merging using sample-based score estimates
Full text PdfPdf (310 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 27 ,  Issue 3  (May 2009) table of contents
Article No. 14  
Year of Publication: 2009
ISSN:1046-8188
Authors
Milad Shokouhi  RMIT University
Justin Zobel  RMIT University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 50,   Downloads (12 Months): 241,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508850.1508852
What is a DOI?

ABSTRACT

In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
5
 
6
Baillie, M., Azzopardi, L., and Crestani, F. 2006. Adaptive query-based sampling of distributed collections. In SPIRE String Processing and Information Retrieval Symposium. Springer, Glasgow, U.K. 316--328.
7
 
8
Bernstein, Y., Shokouhi, M., and Zobel, J. 2006. Compact features for detection of near-duplicates in distributed retrieval. In SPIRE String Processing and Information Retrieval Symposium. Springer, Glasgow, U.K. 110--121.
 
9
 
10
Callan, J. 2000. Distributed information retrieval. Advances in Information Retrieval. Kluwer, Norwell, MA, Chapter 5, 127--150.
11
12
 
13
14
15
16
 
17
Craswell, N., Hawking, D., and Thistlewaite, P. 1999. Merging results from isolated search engines. In Proceedings of the 10th Australasian Database Conference. Springer-Verlag, Auckland, New Zealand, 189--200.
 
18
Croft, B. 2000. Combining approaches to information retrieval. Advances in Information Retrieval. Kluwer, Norwell, MA, Chapter 1, 1--36.
19
 
20
D'Souza, D. and Thom, J. 1999. Collection selection using n-term indexing. In Proceedings of the Second International Symposium on Cooperative Database Systems for Advanced Applications (CODAS'99). Springer, Wollongong, Australia, 52--63.
 
21
 
22
Fox, E. and Shaw, J. 1993. Combination of multiple searches. In Proceedings of the Second Text REtrieval Conference. NIST Special Publication. National Institute of Science and Technology, Gaithersburg, MD, 243--252.
23
24
 
25
26
27
 
28
29
 
30
31
32
 
33
Gross, J. 2003. Linear Regression. Springer, Berlin, Germany.
34
 
35
Hedley, Y., Younas, M., James, A., and Sanderson, M. 2004b. A two-phase sampling technique to improve the accuracy of text similarities in the categorisation of hidden Web databases. In Proceedings of the International Conference on Web Informations Systems. Springer, Brisbane, Australia, 516--527.
36
37
 
38
Kirsch, T. 2003. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents. U.S. Patent 5,659,732.
39
40
 
41
42
43
44
45
 
46
Ng, K. 1998. An investigation of the conditions for effective data fusion in information retrieval. Ph.D. dissertation. Rutgers University, New Brunswick, NJ.
47
48
49
 
50
Paltoglou, G., Salampasis, M., and Satratzemi, M. 2007. Results merging algorithm using multiple regression models. In Proceedings of the Euorpean Conference on Information Retrieval. Springer, Rome, Italy, 173--184.
 
51
52
53
 
54
 
55
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the metacrawler. In Proceedings of the 4th International Conference on the World Wide Web. Oreilly, Boston, MA.
 
56
Selberg, E. and Etzioni, O. 1997. The MetaCrawler architecture for resource aggregation on the web. IEEE Expert 12, 1, 8--14.
 
57
Shokouhi, M. 2007. Central-rank-based collection selection in uncooperative distributed information retrieval. In Proceedings of the Euorpean Conference on Information Retrieval. Springer, Rome, Italy, 160--172.
 
58
Shokouhi, M., Scholer, F., and Zobel, J. 2006a. Sample sizes for query probing in uncooperative distributed information retrieval. In Proceedings of the 8th Asia Pacific Web Conference. Springer (Harbin, China). 63--75.
59
 
60
61
62
 
63
Si, L. and Callan, J. 2003a. The effect of database size distribution on resource selection algorithms. In Proeedings of the SIGIR 2003 Workshop on Distributed Information Retrieval (Toronto, Ont., Canada). 31--42.
64
65
66
67
68
69
 
70
 
71
 
72
 
73
 
74
75
76
77
 
78
Zhai, C. 2001. Notes on the lemur TFIDF model. School of Computer Science. Carnegie Mellon University, Pittsburgh, PA. unpublished report. www.cs.cmu.edu/~lemur/1.1/tfidf.ps.
 
79
Zobel, J. 1997. Collection selection via lexicon inspection. In Proceedings of the Australian Document Computing Symposium (Melbourne, Australia). 74--80.


Collaborative Colleagues:
Milad Shokouhi: colleagues
Justin Zobel: colleagues