|
ABSTRACT
In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Javed A. Aslam , Virgiliu Pavlu , Robert Savell, A unified model for metasearch, pooling, and system evaluation, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956953]
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Baillie, M., Azzopardi, L., and Crestani, F. 2006. Adaptive query-based sampling of distributed collections. In SPIRE String Processing and Information Retrieval Symposium. Springer, Glasgow, U.K. 316--328.
|
 |
7
|
|
| |
8
|
Bernstein, Y., Shokouhi, M., and Zobel, J. 2006. Compact features for detection of near-duplicates in distributed retrieval. In SPIRE String Processing and Information Retrieval Symposium. Springer, Glasgow, U.K. 110--121.
|
| |
9
|
|
| |
10
|
Callan, J. 2000. Distributed information retrieval. Advances in Information Retrieval. Kluwer, Norwell, MA, Chapter 5, 127--150.
|
 |
11
|
|
 |
12
|
Jamie Callan , Margaret Connell , Aiqun Du, Automatic discovery of language models for text databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.479-490, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
13
|
|
 |
14
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
 |
15
|
|
 |
16
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336628]
|
| |
17
|
Craswell, N., Hawking, D., and Thistlewaite, P. 1999. Merging results from isolated search engines. In Proceedings of the 10th Australasian Database Conference. Springer-Verlag, Auckland, New Zealand, 189--200.
|
| |
18
|
Croft, B. 2000. Combining approaches to information retrieval. Advances in Information Retrieval. Kluwer, Norwell, MA, Chapter 1, 1--36.
|
 |
19
|
|
| |
20
|
D'Souza, D. and Thom, J. 1999. Collection selection using n-term indexing. In Proceedings of the Second International Symposium on Cooperative Database Systems for Advanced Applications (CODAS'99). Springer, Wollongong, Australia, 52--63.
|
| |
21
|
|
| |
22
|
Fox, E. and Shaw, J. 1993. Combination of multiple searches. In Proceedings of the Second Text REtrieval Conference. NIST Special Publication. National Institute of Science and Technology, Gaithersburg, MD, 243--252.
|
 |
23
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312684]
|
 |
24
|
|
| |
25
|
|
 |
26
|
Eric J. Glover , Steve Lawrence , William P. Birmingham , C. Lee Giles, Architecture of a metasearch engine that supports user information needs, Proceedings of the eighth international conference on Information and knowledge management, p.210-216, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.319980]
|
 |
27
|
Luis Gravano , Chen-Chuan K. Chang , Héctor García-Molina , Andreas Paepcke, STARTS: Stanford proposal for Internet meta-searching, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.207-218, May 11-15, 1997, Tucson, Arizona, United States
|
| |
28
|
|
 |
29
|
Luis Gravano , Héctor García-Molina , Anthony Tomasic, The effectiveness of GIOSS for the text database discovery problem, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.126-137, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
Gross, J. 2003. Linear Regression. Springer, Berlin, Germany.
|
 |
34
|
|
| |
35
|
Hedley, Y., Younas, M., James, A., and Sanderson, M. 2004b. A two-phase sampling technique to improve the accuracy of text similarities in the categorisation of hidden Web databases. In Proceedings of the International Conference on Web Informations Systems. Springer, Brisbane, Australia, 516--527.
|
 |
36
|
|
 |
37
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
| |
38
|
Kirsch, T. 2003. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents. U.S. Patent 5,659,732.
|
 |
39
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
40
|
Leah S. Larkey , Margaret E. Connell , Jamie Callan, Collection selection and results merging with topically organized U.S. patents and TREC data, Proceedings of the ninth international conference on Information and knowledge management, p.282-289, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354830]
|
| |
41
|
|
 |
42
|
|
 |
43
|
David Lillis , Fergus Toolan , Rem Collier , John Dunnion, ProbFuse: a probabilistic approach to data fusion, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148197]
|
 |
44
|
|
 |
45
|
|
| |
46
|
Ng, K. 1998. An investigation of the conditions for effective data fusion in information retrieval. Ph.D. dissertation. Rutgers University, New Brunswick, NJ.
|
 |
47
|
|
 |
48
|
|
 |
49
|
|
| |
50
|
Paltoglou, G., Salampasis, M., and Satratzemi, M. 2007. Results merging algorithm using multiple regression models. In Proceedings of the Euorpean Conference on Information Retrieval. Springer, Rome, Italy, 173--184.
|
| |
51
|
|
 |
52
|
|
 |
53
|
|
| |
54
|
|
| |
55
|
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the metacrawler. In Proceedings of the 4th International Conference on the World Wide Web. Oreilly, Boston, MA.
|
| |
56
|
Selberg, E. and Etzioni, O. 1997. The MetaCrawler architecture for resource aggregation on the web. IEEE Expert 12, 1, 8--14.
|
| |
57
|
Shokouhi, M. 2007. Central-rank-based collection selection in uncooperative distributed information retrieval. In Proceedings of the Euorpean Conference on Information Retrieval. Springer, Rome, Italy, 160--172.
|
| |
58
|
Shokouhi, M., Scholer, F., and Zobel, J. 2006a. Sample sizes for query probing in uncooperative distributed information retrieval. In Proceedings of the 8th Asia Pacific Web Conference. Springer (Harbin, China). 63--75.
|
 |
59
|
|
| |
60
|
Milad Shokouhi , Justin Zobel , Yaniv Bernstein, Distributed text retrieval from overlapping collections, Proceedings of the eighteenth conference on Australasian database, p.141-150, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
 |
61
|
Milad Shokouhi , Justin Zobel , Falk Scholer , S. M. M. Tahaghoghi, Capturing collection size for distributed non-cooperative retrieval, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148227]
|
 |
62
|
|
| |
63
|
Si, L. and Callan, J. 2003a. The effect of database size distribution on resource selection algorithms. In Proeedings of the SIGIR 2003 Workshop on Distributed Information Retrieval (Toronto, Ont., Canada). 31--42.
|
 |
64
|
|
 |
65
|
|
 |
66
|
|
 |
67
|
|
 |
68
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584856]
|
 |
69
|
|
| |
70
|
|
| |
71
|
|
| |
72
|
|
| |
73
|
|
| |
74
|
|
 |
75
|
|
 |
76
|
|
 |
77
|
|
| |
78
|
Zhai, C. 2001. Notes on the lemur TFIDF model. School of Computer Science. Carnegie Mellon University, Pittsburgh, PA. unpublished report. www.cs.cmu.edu/~lemur/1.1/tfidf.ps.
|
| |
79
|
Zobel, J. 1997. Collection selection via lexicon inspection. In Proceedings of the Australian Document Computing Symposium (Melbourne, Australia). 74--80.
|
|