ACM Home Page
Please provide us with feedback. Feedback
Wrapper generation for semi-structured Internet sources
Full text PdfPdf (1.07 MB)
Source ACM SIGMOD Record archive
Volume 26 ,  Issue 4  (December 1997) table of contents
Pages: 8 - 15  
Year of Publication: 1997
ISSN:0163-5808
Authors
Naveen Ashish  Information Sciences Institute and Department of Computer Science, University of Southern California, 4676 Admiralty Way Marina del Rey, CA
Craig A. Knoblock  Information Sciences Institute and Department of Computer Science, University of Southern California, 4676 Admiralty Way Marina del Rey, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 67,   Citation Count: 34
Additional Information:

abstract   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/271074.271078
What is a DOI?

ABSTRACT

With the current explosion of information on the World Wide Web (WWW) a wealth of information on many different subjects has become available on-line. Numerous sources contain information that can be classified as semi-structured. At present, however, the only way to access the information is by browsing individual pages. We cannot query web documents in a database-like fashion based on their underlying structure. However, we can provide database-like querying for semi-structured WWW sources by building wrappers around these sources. We present an approach for semi-automatically generating such wrappers. The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. We demonstrate the ease with which we are able to build wrappers for a number of internet sources in different domains using our implemented wrapper generation toolkit.


CITED BY  34

Collaborative Colleagues:
Naveen Ashish: colleagues
Craig A. Knoblock: colleagues