| Rapper: a wrapper generator with linguistic knowledge |
| Full text |
Pdf
(822 KB)
|
| Source
|
Workshop On Web Information And Data Management
archive
Proceedings of the 2nd international workshop on Web information and data management
table of contents
Kansas City, Missouri, United States
Pages: 6 - 11
Year of Publication: 1999
ISBN:1-58113-221-2
|
|
Authors
|
|
David Mattox
|
The MITRE Corporation, 1820 Dolley Madison, McLean, VA
|
|
Len Seligman
|
The MITRE Corporation, 1820 Dolley Madison, McLean, VA
|
|
Ken Smith
|
The MITRE Corporation, 1820 Dolley Madison, McLean, VA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 25, Citation Count: 3
|
|
|
ABSTRACT
Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form. Often, wrappers are needed to extract information and organize it into a graph structure that makes explicit the concepts users want to query and update. This paper presents a new approach to wrapper generation that exploits linguistic knowledge. The approach produces a more fine-grained parse of sources with natural language text than previous efforts. The resulting graph structured databases answer queries that could not be formulated in database produced by prior generated wrappers. In addition, our approach may be more robust in the face of slight variations in word choice and order. We discuss a prototype implementation, lessons learned to date, evaluation issues, and future research directions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Peter Buneman , Susan Davidson , Gerd Hillebrand , Dan Suciu, A query language and optimization techniques for unstructured data, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.505-516, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
4
|
CIA World Factbook, http://www.odci, gov/ci~publications/factbook/i ndex. html
|
 |
5
|
Mary Fernández , Daniela Florescu , Jaewoo Kang , Alon Levy , Dan Suciu, Catching the boat with Strudel: experiences with a Web-site management system, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.414-425, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
J. Hammer , H. Gaxcia-Molina , I. Cho , R. Axanha, A. Crespo, "Extracting Semistructured Information from the Web," SIGMOD Record, 26(4), December 1997
|
 |
7
|
Ling Liu , Wei Han , David Buttler , Calton Pu , Wei Tang, An XJML-based wrapper generator for Web information extraction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.540-543, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
USNI Military Database, .http:/(www.periscope. usni.com/demo/demoinfo.html
|
| |
12
|
|
| |
13
|
World Wide Web Consortium, XML Home Page, http:www, w3.org/XML
|
|