|
ABSTRACT
In this paper we describe how we combined SDLIP and STARTS, two comple mentary protocols for searching over distributed document collections. The resulting protocol, which we call SDARTS, is simple yet expressible enough to enable building sophisticated metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch-specific elements from STARTS. We also report on our experience building three SDARTS-compliant wrappers: for locally available plain-text document collections, for locally available XML document collections, and for external web-accessible collections. These wrappers were developed to be easily customizable for new collections. Our work was developed as part of Columbia University's Digital Libraries Initiative--Phase 2 (DLI2) project, which involves the departments of Computer Science, Medical Informatics, and Electrical Engineering, the Columbia University libraries, and a large number of industrial partners. The main goal of the project is to provide personalized access to a distributed patient-care digital library.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
International Standard Maintenance Agency. Z39.50 Maintenance Agency Page. Accessible at http://www.loc.gov/z3950/agency/. ISMA, 2000.
|
| |
2
|
C. Blake and C. Merz. University of California at Irvine repository of machine learning databases. Accessible at http://kdd.ics.uci.edu/.
|
| |
3
|
C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz. Harvest: A scalable, customizable discovery and access system. Technical Report CU-CS-732-94, Department of Computer Science, University of Colorado-Boulder, Aug. 1994.
|
 |
4
|
Jamie Callan , Margaret Connell , Aiqun Du, Automatic discovery of language models for text databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.479-490, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
5
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
E. Christian. Application profile for the government information locator service GILS, Version 2, Aug. 1997. Accessible at http://www.usgs.gov/gils/prof v2.html.
|
 |
10
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336628]
|
 |
11
|
Luis Gravano , Chen-Chuan K. Chang , Héctor García-Molina , Andreas Paepcke, STARTS: Stanford proposal for Internet meta-searching, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.207-218, May 11-15, 1997, Tucson, Arizona, United States
|
 |
12
|
|
 |
13
|
|
| |
14
|
HTML Tidy. Accessible at http://www.w3.org/People/Raggett/tidy/, 2000.
|
 |
15
|
Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Probe, count, and classify: categorizing hidden web databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.67-78, May 21-24, 2001, Santa Barbara, California, United States
|
 |
16
|
|
| |
17
|
The Lucene Search Engine. Accessible at http://www.lucene.com/, 2000.
|
| |
18
|
Weiyi Meng , King-Lup Liu , Clement T. Yu , Xiaodong Wang , Yuhsi Chang , Naphtali Rishe, Determining Text Databases to Search in the Internet, Proceedings of the 24rd International Conference on Very Large Data Bases, p.14-25, August 24-27, 1998
|
| |
19
|
Open Archives Initiative. Accessible at http://www.openarchives.org/, 2000.
|
| |
20
|
A. Paepcke, R. Brandriff, G. Janee, R. Larson, B. Ludaescher, S. Melnik, and S. Raghavan. Search middleware and the Simple Digital Library Interoperability Protocol. D-Lib Magazine, 6(3), 2000.
|
| |
21
|
|
| |
22
|
E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In Overview of the Third Text REtrieval Conference (TREC-3), pages 95-104. Department of Commerce, National Institute of Standards and Technology, Mar. 1995.
|
| |
23
|
S. Weibel, J. Godby, E. Miller, and R. Daniel Jr. OCLC/NCSA metadata workshop report, 1995. Accessible at http://www.oclc.org:5047/oclc/- research/publications/weibel/metadata/- dublin core report.html.
|
 |
24
|
|
CITED BY 5
|
|
Kathleen R. McKeown , Shih-Fu Chang , James Cimino , Steven Feiner , Carol Friedman , Luis Gravano , Vasileios Hatzivassiloglou , Steven Johnson , Desmond A. Jordan , Judith L. Klavans , André Kushniruk , Vimla Patel , Simone Teufel, PERSIVAL, a system for personalized search and summarization over multimedia healthcare information, Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, p.331-340, January 2001, Roanoke, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|