ACM Home Page
Please provide us with feedback. Feedback
GlOSS: text-source discovery over the Internet
Full text PdfPdf (230 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 24 ,  Issue 2  (June 1999) table of contents
Pages: 229 - 264  
Year of Publication: 1999
ISSN:0362-5915
Authors
Luis Gravano  Columbia Univ., New York, NY
Héctor García-Molina  Stanford Univ., Stanford, CA
Anthony Tomasic  INRIA Rocquencourt, Le Chesnay, France
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 65,   Citation Count: 81
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/320248.320252
What is a DOI?

ABSTRACT

The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BARBAR , D. AND CLIFTON, C. 1992. Information brokers: Sharing knowledge in a heterogeneous distributed system. Tech. Rep. MITL-TR-31-92. Matsushita Information Technology Laboratory.
 
2
BOWMAN, C. M., DANZIG, P. B., HARDY, D. R., MANBER, U., AND SCHWARTZ, M. F. 1994. Harvest: A scalable, customizable discovery and access system. Tech. Rep. CU-CS-732-94. Dept. Computer Science, Univ. of Colorado, Boulder.
3
 
4
CHAMIS, A.Y. 1988. Selection of online databases using switching vocabularies. J. Am. Soc. Inf. Sci. 39, 3, 217-218.
 
5
6
 
7
DANZIG, P. B., LI, S. -H., AND OBRACZKA, K. 1992. Distributed indexing of autonomous internet services. Comput. Syst. 5, 4, 433-459.
 
8
 
9
DUDA, A. AND SHELDON, M.A. 1994. Content routing in a network of WAIS servers. In Proceedings of the 14th IEEE International Conference on Distributed Computing Systems (Poznan, Poland, June). IEEE Computer Society Press, Los Alamitos, CA.
 
10
FLATER, D. W. AND YESHA, Y. 1993. An information retrieval system for network resources. In Proceedings of the International Workshop on Next Generation Information Technologies and Systems (June).
11
 
12
FULLTON, J. AND WARNOCK, A. ET AL. 1993. Release Notes for Free WAIS 0.2.
13
 
14
 
15
 
16
 
17
18
 
19
 
20
 
21
MORRIS, A., DRENTH, H., AND TSENG, G. 1993. The development of an expert system for online company database selection. Expert Syst. 10, 2 (May), 47-60.
 
22
 
23
NEUMAN, B. C. 1992. The Prospero file system: A global file system based on the virtual system model. Comput. Syst. 5, 4, 407-432.
 
24
 
25
ORDILLE, J. J. AND MILLER, B. P. 1992. Distributed active catalogs and meta-data caching in descriptive name services. Tech. Rep. 1118. University of Wisconsin at Madison, Madison, WI.
 
26
 
27
 
28
 
29
SCHWARTZ, M. F. 1990. A scalable, non-hierarchical resource discovery mechanism based on probabilistic protocols. Tech. Rep. CU-CS-474-90. Department of Computer Science, University of Colorado at Boulder, Boulder, CO.
 
30
 
31
SCHWARTZ, M. F., EMTAGE, A., KAHLE, B., AND NEUMAN, B. C. 1992. A comparison of Internet resource discovery approaches. Comput. Syst. 5, 4, 461-493.
 
32
SELBERG, E. AND ETZIONI, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the Fourth International Conference on World-Wide Web (Dec.).
 
33
 
34
SIMPSON, P. AND ALONSO, R. 1989. Querying a network of autonomous databases. Tech. Rep. CS-TR-202-89. Department of Computer Science, Princeton Univ., Princeton, NJ.
35
 
36
VOORHEES, E. M., GUPTA, N. K., AND JOHNSON-LAIRD, B. 1995. The collection fusion problem. In Proceedings of the Third Conference on Text Retrieval (TREC-3, Mar.).
 
37
YAN, T. W. AND GARC A-MOLINA, H. 1995. SIFT--a tool for wide-area information dissemination. In Proceedings of the 1995 USENIX Technical Conference (Jan.). USENIX Assoc., Berkeley, CA, 177-186.
 
38
ZAHIR, S. AND CHANG, C. L. 1992. Online-Expert: An expert system for online database selection. J. Am. Soc. Inf. Sci. 43, 5, 340-357.

CITED BY  81


REVIEW

"Edward Y. Lee : Reviewer"

The idea of a Glossary of Servers Server (GlOSS) has been discussed in several other publications. This paper expands the idea into three versions of the implementation as applied to the discovery of text-source documents available over the In  more...

Collaborative Colleagues:
Luis Gravano: colleagues
Héctor García-Molina: colleagues
Anthony Tomasic: colleagues