|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
BERGMAN, M. 2000. The deep Web: Surfacing the hidden value. BrightPlanet, www.completeplanet. com/Tutorials/DeepWeb/index.asp.
|
| |
4
|
BOYAN, J., FREITAG,D.,AND JOACHIMS, T. 1996. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet-Based Information Systems (Portland, OR, 1996).
|
| |
5
|
|
| |
6
|
BUCKLEY, C., SALTON,G.,AND ALLAN, J. 1993. Automatic retrieval with locality information using smart. In Proceedings of the First Text Retrieval Conference, NIST Special Publication 500-207 (March), 59-72.
|
| |
7
|
CALLAN, J. 2000. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, W. Bruce Croft, ed. Kluwer Academic Publishers. 127-150.
|
 |
8
|
Jamie Callan , Margaret Connell , Aiqun Du, Automatic discovery of language models for text databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.479-490, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
9
|
CALLAN, J., CROFT,B.,AND HARDING, S. 1992. The inquery retrieval system. In Proceedings of the Third DEXA Conference (Valencia, Spain, 1992), 78-83.
|
 |
10
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
11
|
Soumen Chakrabarti , Byron E. Dom , S. Ravi Kumar , Prabhakar Raghavan , Sridhar Rajagopalan , Andrew Tomkins , David Gibson , Jon Kleinberg, Mining the Web's Link Structure, Computer, v.32 n.8, p.60-67, August 1999
[doi> 10.1109/2.781636]
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
CRASWELL, N., HAWKING,D.,AND THISTLEWAITE,P. 1999. Merging results from isolated search engines. In Proceedings of the Tenth Australasian Database Conference (Auckland, New Zealand, Jan. 1999), 189-200.
|
| |
17
|
CROFT, W. 2000. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, W. Bruce Croft, ed. Kluwer Academic Publishers. 1-36.
|
| |
18
|
CUTLER, M., SHIH,Y.,AND MENG, W. 1997. Using the structures of html documents to improve retrieval. In Proceedings of the USENIX Symposium 7on Internet Technologies and Systems (Monterey, CA, Dec. 1997), 241-251.
|
 |
19
|
|
| |
20
|
FAN,Y.AND GAUCH, S. 1999. Adaptive agents for information gathering from multiple, distributed information sources. In Proceedings of the 1999 AAAI Symposium on Intelligent Agents in Cyerspace (Stanford University, Palo Alto, CA, March 1999), 40-46.
|
| |
21
|
FOX,E.AND SHAW, J. 1994. Combination of multiple searches. In Proceedings of the Second Text REtrieval Conference (Gaithersburg, MD, Aug. 1994), 243-252.
|
 |
22
|
|
 |
23
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312684]
|
 |
24
|
James C. French , Allison L. Powell , Charles L. Viles , Travis Emmitt , Kevin J. Prey, Evaluating database selection techniques: a testbed and experiment, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.121-129, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290976]
|
| |
25
|
GAUCH, S., WANG,G.,AND GOMEZ, M. 1996. Profusion: intelligent fusion from multiple, distributed search engines. J. Univers. Comput. Sci. 2, 9, 637-649.
|
 |
26
|
Luis Gravano , Chen-Chuan K. Chang , Héctor García-Molina , Andreas Paepcke, STARTS: Stanford proposal for Internet meta-searching, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.207-218, May 11-15, 1997, Tucson, Arizona, United States
|
| |
27
|
|
| |
28
|
|
 |
29
|
Luis Gravano , Héctor García-Molina , Anthony Tomasic, The effectiveness of GIOSS for the text database discovery problem, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.126-137, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
30
|
|
 |
31
|
Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Probe, count, and classify: categorizing hidden web databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.67-78, May 21-24, 2001, Santa Barbara, California, United States
|
 |
32
|
|
| |
33
|
KAHLE,B.AND MEDLAR, A. 1991. An information system for corporate users: wide area information servers. Technical Report TMC199, Thinking Machine Corporation (April).
|
| |
34
|
KIRK, T., LEVY, A., SAGIV,Y.,AND SRIVASTAVA, D. 1995. The information manifold. In AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments (1995).
|
 |
35
|
|
| |
36
|
|
 |
37
|
Joseph A. Konstan , Bradley N. Miller , David Maltz , Jonathan L. Herlocker , Lee R. Gordon , John Riedl, GroupLens: applying collaborative filtering to Usenet news, Communications of the ACM, v.40 n.3, p.77-87, March 1997
[doi> 10.1145/245108.245126]
|
| |
38
|
|
| |
39
|
|
| |
40
|
LAWRENCE,S.AND LEE GILES, C. 1999. Accessibility of information on the web. Nature 400, 107-109.
|
 |
41
|
|
| |
42
|
|
 |
43
|
King-Kup Liu , Weiyi Meng , Clement Yu, Discovery of similarity computations of search engines, Proceedings of the ninth international conference on Information and knowledge management, p.290-297, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354831]
|
| |
44
|
|
| |
45
|
|
| |
46
|
MANBER,U.AND BIGOT, P. 1997. The search broker. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (Monterey, CA, December 1997), 231-239.
|
| |
47
|
MANBER,U.AND BIGOT, P. 1998. Connecting diverse web search facilities. Data Eng. Bull. 21,2 (June), 21-27.
|
| |
48
|
MAULDIN, M. 1997. Lycos: design choices in an internet search service. IEEE Expert 12,1 (Feb.), 1-8.
|
| |
49
|
MCBRYAN, O. 1994. Genvl and wwww: Tools for training the Web. In Proceedings of the First World Wide Web Conference (Geneva, Switzerland, May 1994), 79-90.
|
| |
50
|
Weiyi Meng , King-Lup Liu , Clement T. Yu , Xiaodong Wang , Yuhsi Chang , Naphtali Rishe, Determining Text Databases to Search in the Internet, Proceedings of the 24rd International Conference on Very Large Data Bases, p.14-25, August 24-27, 1998
|
| |
51
|
|
| |
52
|
MENG, W., WANG, W., SUN, H., AND YU, C. 2001. Concept hierarchy based text database categorization. Int. J. Knowl. Inform. Syst. To appear.
|
| |
53
|
|
| |
54
|
MILLER, G. 1990. Wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 235-312.
|
| |
55
|
NCSTRL. n.d. Networked computer science technical reference library. At Web site http:// cstr.cs.cornell.edu.
|
| |
56
|
PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD,T. 1998. The pagerank citation ranking: bring order to the web. Technical report, Stanford University, Palo, Alto, CA.
|
| |
57
|
ROBERTSON, S., WALKER,S.,AND BEAULIEU, M. 1999. Okapi at trec-7: automatic ad hoc, filtering, vlc, and interactive track. In Proceedings of the Seventh Text Retrieval Conference (Gaithersburg, MD, Nov. 1999), 253-264.
|
| |
58
|
|
| |
59
|
|
| |
60
|
SELBERG,E.AND ETZIONI, O. 1995. Multiservice search and comparison using the metacrawler. In Proceedings of the Fourth World Wide Web Conference (Boston, MA, Dec. 1995), 195-208.
|
| |
61
|
SELBERG,E.AND ETZIONI, O. 1997. The metacrawler architecture for resource aggregation on the web. IEEE Expert 12, 1, 8-14.
|
| |
62
|
Mark A. Sheldon , Andrzej Duda , Ron Weiss , James W. O'Toole, Jr. , David K. Gifford, Content routing for distributed information servers, Proceedings of the 4th international conference on extending database technology on Advances in database technology, p.109-122, May 1994, Cambridge, United Kingdom
|
 |
63
|
|
| |
64
|
|
| |
65
|
TOWELL, G., VOORHEES, E., GUPTA,N.,AND JOHNSON- LAIRD, B. 1995. Learning collection fusion strategies for information retrieval. In Proceedings of the 12th International Conference on Machine Learning (Tahoe City, CA, July 1995), 540-548.
|
 |
66
|
|
| |
67
|
|
| |
68
|
VOORHEES, E. 1996. Siemens trec-4 report: further experiments with database merging. In Proceedings of the Fourth Text Retrieval Conference (Gaithersburg, MD, Nov. 1996), 121-130.
|
| |
69
|
VOORHEES, E., GUPTA,N.,AND JOHNSON-LAIRD,B. 1995a. The collection fusion problem. In Proceedings of the Third Text Retrieval Conference (Gaithersburg, MD, Nov. 1995), 95-104.
|
 |
70
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
 |
71
|
|
| |
72
|
|
| |
73
|
WIDDER, D. 1989. Advanced Calculus, 2nd ed. Dover Publications, Inc., New York, NY.
|
 |
74
|
Zonghuan Wu , Weiyi Meng , Clement Yu , Zhuogang Li, Towards a highly-scalable and effective metasearch engine, Proceedings of the 10th international conference on World Wide Web, p.386-395, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372093]
|
 |
75
|
|
 |
76
|
|
 |
77
|
|
| |
78
|
|
| |
79
|
|
 |
80
|
Clement Yu , Weiyi Meng , King-Lup Liu , Wensheng Wu , Naphtali Rishe, Efficient and effective metasearch for a large number of text databases, Proceedings of the eighth international conference on Information and knowledge management, p.217-224, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.320005]
|
 |
81
|
Clement Yu , Weiyi Meng , Wensheng Wu , King-Lup Liu, Efficient and effective metasearch for text databases incorporating linkages among documents, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.187-198, May 21-24, 2001, Santa Barbara, California, United States
|
| |
82
|
|
| |
83
|
|
CITED BY 50
|
|
|
|
|
Zonghuan Wu , Weiyi Meng , Clement Yu , Zhuogang Li, Towards a highly-scalable and effective metasearch engine, Proceedings of the 10th international conference on World Wide Web, p.386-395, May 01-05, 2001, Hong Kong, Hong Kong
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongkun Zhao , Weiyi Meng , Zonghuan Wu , Vijay Raghavan , Clement Yu, Fully automatic wrapper generation for search engines, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Milad Shokouhi , Justin Zobel , Yaniv Bernstein, Distributed text retrieval from overlapping collections, Proceedings of the eighteenth conference on Australasian database, p.141-150, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
Jon Kleinberg, Social networks, incentives, and search, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, p.210-211, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Sebastian Michel , Matthias Bender , Nikos Ntarmos , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Ronak Desai , Qi Yang , Zonghuan Wu , Weiyi Meng , Clement Yu, Identifying redundant search engines in a very large scale metasearch engine context, Proceedings of the eighth ACM international workshop on Web information and data management, November 10-10, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Klaus Berberich , Srikanta Bedathur , Gerhard Weikum , Michalis Vazirgiannis, Comparing apples and oranges: normalized pagerank for evolving graphs, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
Yiyao Lu , Zonghuan Wu , Hongkun Zhao , Weiyi Meng , King-Lup Liu , Vijay Raghavan , Clement Yu, MySearchView: a customized metasearch engine generator, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
|
|
|
King-Lup Liu , Weiyi Meng , Jing Qiu , Clement Yu , Vijay Raghavan , Zonghuan Wu , Yiyao Lu , Hai He , Hongkun Zhao, AllInOneNews: development and evaluation of a large-scale news metasearch engine, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pascal Felber , Toan Luu , Martin Rajman , Etienne Riviere, Managing collaborative feedback information for distributed retrieval, Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Ning Liu , Jun Yan , Weiguo Fan , Qiang Yang , Zheng Chen, Identifying vertical search intention of query through social tagging propagation, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio Junqueira , Vassilis Plachouras , Luca Telloli, On the feasibility of multi-site web search engines, Proceeding of the 18th ACM conference on Information and knowledge management, November 02-06, 2009, Hong Kong, China
|
|