| Wrapper generation for semi-structured Internet sources |
| Full text |
Pdf
(1.07 MB)
|
| Source
|
ACM SIGMOD Record
archive
Volume 26 , Issue 4 (December 1997)
table of contents
Pages: 8 - 15
Year of Publication: 1997
ISSN:0163-5808
|
|
Authors
|
|
Naveen Ashish
|
Information Sciences Institute and Department of Computer Science, University of Southern California, 4676 Admiralty Way Marina del Rey, CA
|
|
Craig A. Knoblock
|
Information Sciences Institute and Department of Computer Science, University of Southern California, 4676 Admiralty Way Marina del Rey, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 67, Citation Count: 34
|
|
|
ABSTRACT
With the current explosion of information on the World Wide Web (WWW) a wealth of information on many different subjects has become available on-line. Numerous sources contain information that can be classified as semi-structured. At present, however, the only way to access the information is by browsing individual pages. We cannot query web documents in a database-like fashion based on their underlying structure. However, we can provide database-like querying for semi-structured WWW sources by building wrappers around these sources. We present an approach for semi-automatically generating such wrappers. The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. We demonstrate the ease with which we are able to build wrappers for a number of internet sources in different domains using our implemented wrapper generation toolkit.
CITED BY 34
|
|
|
|
|
|
|
|
Hasan Davulcu , Guizhen Yang , Michael Kifer , I. V. Ramakrishnan, Computational aspects of resilient data extraction from semistructured sources (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.136-144, May 15-18, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chuang-Hue Moh , Ee-Peng Lim , Wee-Keong Ng, Re-engineering structures from Web documents, Proceedings of the fifth ACM conference on Digital libraries, p.67-76, June 02-07, 2000, San Antonio, Texas, United States
|
|
|
Reo-Jo Yamashita , Tetsuro Ito , Hsiu-Hsen Yao, ESSQL: an enhanced semi-structured query language for composite document retrievals, Proceedings of the 16th annual international conference on Computer documentation, p.120-126, September 24-26, 1998, Quebec, Quebec, Canada
|
|
|
Stephen W. Liddle , Douglas M. Campbell , Chad Crawford, Automatically extracting structure and data from business reports, Proceedings of the eighth international conference on Information and knowledge management, p.86-93, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
David W. Embley , Douglas M. Campbell , Randy D. Smith , Stephen W. Liddle, Ontology-based extraction and structuring of information from data-rich unstructured documents, Proceedings of the seventh international conference on Information and knowledge management, p.52-59, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
David W. Embley , Douglas M. Campbell , Randy D. Smith , Stephen W. Liddle, Ontology-based extraction and structuring of information from data-rich unstructured documents, Proceedings of the seventh international conference on Information and knowledge management, p.52-59, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
David Mattox , Len Seligman , Ken Smith, Rapper: a wrapper generator with linguistic knowledge, Proceedings of the 2nd international workshop on Web information and data management, p.6-11, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
Satoshi Morinaga , Kenji Yamanishi , Kenji Tateishi , Toshikazu Fukushima, Mining product reputations on the Web, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Berthier Ribeiro-Neto , Alberto H. F. Laender , Altigran S. da Silva, Extracting semi-structured data through examples, Proceedings of the eighth international conference on Information and knowledge management, p.94-101, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bettina Fazzinga , Sergio Flesca , Andrea Tagarelli , Salvatore Garruzzo , Elio Masciari, A wrapper generation system for PDF documents, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|