ACM Home Page
Please provide us with feedback. Feedback
Rapper: a wrapper generator with linguistic knowledge
Full text PdfPdf (822 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 2nd international workshop on Web information and data management table of contents
Kansas City, Missouri, United States
Pages: 6 - 11  
Year of Publication: 1999
ISBN:1-58113-221-2
Authors
David Mattox  The MITRE Corporation, 1820 Dolley Madison, McLean, VA
Len Seligman  The MITRE Corporation, 1820 Dolley Madison, McLean, VA
Ken Smith  The MITRE Corporation, 1820 Dolley Madison, McLean, VA
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 25,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/319759.319766
What is a DOI?

ABSTRACT

Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form. Often, wrappers are needed to extract information and organize it into a graph structure that makes explicit the concepts users want to query and update. This paper presents a new approach to wrapper generation that exploits linguistic knowledge. The approach produces a more fine-grained parse of sources with natural language text than previous efforts. The resulting graph structured databases answer queries that could not be formulated in database produced by prior generated wrappers. In addition, our approach may be more robust in the face of slight variations in word choice and order. We discuss a prototype implementation, lessons learned to date, evaluation issues, and future research directions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
CIA World Factbook, http://www.odci, gov/ci~publications/factbook/i ndex. html
5
 
6
J. Hammer , H. Gaxcia-Molina , I. Cho , R. Axanha, A. Crespo, "Extracting Semistructured Information from the Web," SIGMOD Record, 26(4), December 1997
7
8
 
9
10
 
11
USNI Military Database, .http:/(www.periscope. usni.com/demo/demoinfo.html
 
12
 
13
World Wide Web Consortium, XML Home Page, http:www, w3.org/XML


Collaborative Colleagues:
David Mattox: colleagues
Len Seligman: colleagues
Ken Smith: colleagues