ACM Home Page
Please provide us with feedback. Feedback
Structured databases on the web: observations and implications
Full text PdfPdf (393 KB)
Source ACM SIGMOD Record archive
Volume 33 ,  Issue 3  (September 2004) table of contents
COLUMN: Surveys table of contents
Pages: 61 - 70  
Year of Publication: 2004
ISSN:0163-5808
Authors
Kevin Chen-Chuan Chang  University of Illinois at Urbana-Champaign
Bin He  University of Illinois at Urbana-Champaign
Chengkai Li  University of Illinois at Urbana-Champaign
Mitesh Patel  University of Illinois at Urbana-Champaign
Zhen Zhang  University of Illinois at Urbana-Champaign
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 161,   Citation Count: 21
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031570.1031584
What is a DOI?

ABSTRACT

The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com, July 2000.
 
2
Steve Lawrence and C. Lee Giles. Accessibility of information on the web. Nature, 400(6740):107--109, 1999.
 
3
Ed O'Neill, Brian Lavoie, and Rick Bennett. Web characterization. Accessible at "http://wcp.oclc.org".
 
4
GNU. wget. Accessible at "http://www.gnu.org/software/wget/wget.html".
 
5
Kevin Chen-Chuan Chang, Bin He, Chengkai Li, and Zhen Zhang. The UIUC web integration repository. Computer Science Department, University of Illinois at Urbana-Champaign. http://metaquerier.cs.uiuc.edu/repository, 2003.
 
6
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, Massachusetts, 1949.
 
7
William W. Cohen. Some practical observations on integration of web information. In WebDB (Informal Proceedings), pages 55--60, 1999.
 
8
9
10
11
12
 
13
 
14
 
15
16
17
18
 
19
 
20
21
 
22
Luis Gravano, Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. STARTS: Stanford protocol proposal for internet retrieval and search. Accessible at http://www-db.stanford.edu/~gravano/starts.html, August 1996.
 
23
 
24
 
25
 
26
 
27
28

CITED BY  21
Collaborative Colleagues:
Kevin Chen-Chuan Chang: colleagues
Bin He: colleagues
Chengkai Li: colleagues
Mitesh Patel: colleagues
Zhen Zhang: colleagues