|
ABSTRACT
Linkages among documents have a significant impact on the importance of documents, as it can be argued that important documents are pointed to by many documents or by other important documents. Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple local sources (text databases). There is a search engine associated with each database. In a large-scale metasearch engine, the contents of each local database is represented by a representative. Each user query is evaluated against he set of representatives of all databases in order to determine the appropriate databases (search engines) to search (invoke) In previous word, the linkage information between documents has not been utilized in determining the appropriate databases to search. In this paper, such information is employed to determine the degree of relevance of a document with respect to a given query. Specifically, the importance (rank) of each document as determined by the linkages is integrated in each database representative to facilitate the selection of databases for each given query. We establish a necessary and sufficient condition to rank databases optimally, while incorporating the linkage information. A method is provided to estimate the desired quantities stated in the necessary and sufficient condition. The estimation method runs in time linearly proportional to the number of query terms. Experimental results are provided to demonstrate the high retrieval effectiveness of the method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
5
|
Y. Fan, and S. Gauch. Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources. 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University, March 1999.
|
| |
6
|
|
| |
7
|
|
| |
8
|
A. Howe, and D. Dreilinger. SavvySearch: A Meta-Search Engine that Learns Which Search Engines to Query. AI Magazine, 18(2), 1997.
|
 |
9
|
|
| |
10
|
B. Kahle, and A. Medlar. An Information System for Corporate Users: Wide Area information Servers. Technical Report TMC199, Thinking Machine Corporation, April 1991.
|
| |
11
|
T. Kirk, A. Levy, Y. Sagiv, and D. Srivastava. The Information Manifold. AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments. 1995.
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Erika F. de Lima , Jan O. Pedersen, Phrase recognition and expansion for short, precision-biased queries based on a query log, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.145-152, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312669]
|
| |
16
|
|
| |
17
|
U. Manber, and P. Bigot. The Search Broker. USENIX Symposium on Internet Technologies and Systems (NSITS'97), Monterey, California, 1997, pp. 231-239.
|
| |
18
|
Weiyi Meng , King-Lup Liu , Clement T. Yu , Xiaodong Wang , Yuhsi Chang , Naphtali Rishe, Determining Text Databases to Search in the Internet, Proceedings of the 24rd International Conference on Very Large Data Bases, p.14-25, August 24-27, 1998
|
| |
19
|
|
| |
20
|
W. Meng, C. Yu, and K. Liu. Building Efficient and Effective Metasearch Engines, Technical Report, Dept. of CS, SUNY at Binghamton, 2000.
|
| |
21
|
L. Page, S. Brin, R. Motwani, and Terry Winograd. The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University, 1998.
|
 |
22
|
Allison L. Powell , James C. French , Jamie Callan , Margaret Connell , Charles L. Viles, The impact of database selection on distributed searching, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.232-239, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345584]
|
| |
23
|
|
| |
24
|
E. Selberg, and O. Etzioni. The MetaCrawler Architecture forResource Aggregation on the Web. IEEE Expert, 1997.
|
 |
25
|
|
 |
26
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
 |
27
|
Zonghuan Wu , Weiyi Meng , Clement Yu , Zhuogang Li, Towards a highly-scalable and effective metasearch engine, Proceedings of the 10th international conference on World Wide Web, p.386-395, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372093]
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
Clement Yu , Weiyi Meng , King-Lup Liu , Wensheng Wu , Naphtali Rishe, Efficient and effective metasearch for a large number of text databases, Proceedings of the eighth international conference on Information and knowledge management, p.217-224, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.320005]
|
| |
34
|
|
CITED BY 11
|
|
King-Lup Liu , Adrain Santoso , Clement Yu , Weiyi Meng, Discovering the representative of a search engine, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
Zonghuan Wu , Weiyi Meng , Clement Yu , Zhuogang Li, Towards a highly-scalable and effective metasearch engine, Proceedings of the 10th international conference on World Wide Web, p.386-395, May 01-05, 2001, Hong Kong, Hong Kong
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Search process
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
General Terms:
Design,
Documentation,
Experimentation,
Management,
Measurement,
Performance,
Theory,
Verification
Keywords:
distributed collection,
information retrieval,
linkages among documents,
metasearch
|