ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Building efficient and effective metasearch engines
Full text PdfPdf (416 KB)
Source ACM Computing Surveys (CSUR) archive
Volume 34 ,  Issue 1  (March 2002) table of contents
Pages: 48 - 89  
Year of Publication: 2002
ISSN:0360-0300
Authors
Weiyi Meng  State University of New York at Binghamton, Binghamton, NY
Clement Yu  University of Illinois at Chicago, Chicago, IL
King-Lup Liu  DePaul University, Chicago, IL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 333,   Citation Count: 50
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/505282.505284
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
BERGMAN, M. 2000. The deep Web: Surfacing the hidden value. BrightPlanet, www.completeplanet. com/Tutorials/DeepWeb/index.asp.
 
4
BOYAN, J., FREITAG,D.,AND JOACHIMS, T. 1996. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet-Based Information Systems (Portland, OR, 1996).
 
5
 
6
BUCKLEY, C., SALTON,G.,AND ALLAN, J. 1993. Automatic retrieval with locality information using smart. In Proceedings of the First Text Retrieval Conference, NIST Special Publication 500-207 (March), 59-72.
 
7
CALLAN, J. 2000. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, W. Bruce Croft, ed. Kluwer Academic Publishers. 127-150.
8
 
9
CALLAN, J., CROFT,B.,AND HARDING, S. 1992. The inquery retrieval system. In Proceedings of the Third DEXA Conference (Valencia, Spain, 1992), 78-83.
10
 
11
12
13
 
14
 
15
 
16
CRASWELL, N., HAWKING,D.,AND THISTLEWAITE,P. 1999. Merging results from isolated search engines. In Proceedings of the Tenth Australasian Database Conference (Auckland, New Zealand, Jan. 1999), 189-200.
 
17
CROFT, W. 2000. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, W. Bruce Croft, ed. Kluwer Academic Publishers. 1-36.
 
18
CUTLER, M., SHIH,Y.,AND MENG, W. 1997. Using the structures of html documents to improve retrieval. In Proceedings of the USENIX Symposium 7on Internet Technologies and Systems (Monterey, CA, Dec. 1997), 241-251.
19
 
20
FAN,Y.AND GAUCH, S. 1999. Adaptive agents for information gathering from multiple, distributed information sources. In Proceedings of the 1999 AAAI Symposium on Intelligent Agents in Cyerspace (Stanford University, Palo Alto, CA, March 1999), 40-46.
 
21
FOX,E.AND SHAW, J. 1994. Combination of multiple searches. In Proceedings of the Second Text REtrieval Conference (Gaithersburg, MD, Aug. 1994), 243-252.
22
23
24
 
25
GAUCH, S., WANG,G.,AND GOMEZ, M. 1996. Profusion: intelligent fusion from multiple, distributed search engines. J. Univers. Comput. Sci. 2, 9, 637-649.
26
 
27
 
28
29
30
31
32
 
33
KAHLE,B.AND MEDLAR, A. 1991. An information system for corporate users: wide area information servers. Technical Report TMC199, Thinking Machine Corporation (April).
 
34
KIRK, T., LEVY, A., SAGIV,Y.,AND SRIVASTAVA, D. 1995. The information manifold. In AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments (1995).
35
 
36
37
 
38
 
39
 
40
LAWRENCE,S.AND LEE GILES, C. 1999. Accessibility of information on the web. Nature 400, 107-109.
41
 
42
43
 
44
 
45
 
46
MANBER,U.AND BIGOT, P. 1997. The search broker. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (Monterey, CA, December 1997), 231-239.
 
47
MANBER,U.AND BIGOT, P. 1998. Connecting diverse web search facilities. Data Eng. Bull. 21,2 (June), 21-27.
 
48
MAULDIN, M. 1997. Lycos: design choices in an internet search service. IEEE Expert 12,1 (Feb.), 1-8.
 
49
MCBRYAN, O. 1994. Genvl and wwww: Tools for training the Web. In Proceedings of the First World Wide Web Conference (Geneva, Switzerland, May 1994), 79-90.
 
50
 
51
 
52
MENG, W., WANG, W., SUN, H., AND YU, C. 2001. Concept hierarchy based text database categorization. Int. J. Knowl. Inform. Syst. To appear.
 
53
 
54
MILLER, G. 1990. Wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 235-312.
 
55
NCSTRL. n.d. Networked computer science technical reference library. At Web site http:// cstr.cs.cornell.edu.
 
56
PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD,T. 1998. The pagerank citation ranking: bring order to the web. Technical report, Stanford University, Palo, Alto, CA.
 
57
ROBERTSON, S., WALKER,S.,AND BEAULIEU, M. 1999. Okapi at trec-7: automatic ad hoc, filtering, vlc, and interactive track. In Proceedings of the Seventh Text Retrieval Conference (Gaithersburg, MD, Nov. 1999), 253-264.
 
58
 
59
 
60
SELBERG,E.AND ETZIONI, O. 1995. Multiservice search and comparison using the metacrawler. In Proceedings of the Fourth World Wide Web Conference (Boston, MA, Dec. 1995), 195-208.
 
61
SELBERG,E.AND ETZIONI, O. 1997. The metacrawler architecture for resource aggregation on the web. IEEE Expert 12, 1, 8-14.
 
62
63
 
64
 
65
TOWELL, G., VOORHEES, E., GUPTA,N.,AND JOHNSON- LAIRD, B. 1995. Learning collection fusion strategies for information retrieval. In Proceedings of the 12th International Conference on Machine Learning (Tahoe City, CA, July 1995), 540-548.
66
 
67
 
68
VOORHEES, E. 1996. Siemens trec-4 report: further experiments with database merging. In Proceedings of the Fourth Text Retrieval Conference (Gaithersburg, MD, Nov. 1996), 121-130.
 
69
VOORHEES, E., GUPTA,N.,AND JOHNSON-LAIRD,B. 1995a. The collection fusion problem. In Proceedings of the Third Text Retrieval Conference (Gaithersburg, MD, Nov. 1995), 95-104.
70
71
 
72
 
73
WIDDER, D. 1989. Advanced Calculus, 2nd ed. Dover Publications, Inc., New York, NY.
74
75
76
77
 
78
 
79
80
81
 
82
 
83

CITED BY  50

Collaborative Colleagues:
Weiyi Meng: colleagues
Clement Yu: colleagues
King-Lup Liu: colleagues