ACM Home Page
Please provide us with feedback. Feedback
Extracting semi-structured data through examples
Full text PdfPdf (1.14 MB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eighth international conference on Information and knowledge management table of contents
Kansas City, Missouri, United States
Pages: 94 - 101  
Year of Publication: 1999
ISBN:1-58113-146-1
Authors
Berthier Ribeiro-Neto  Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
Alberto H. F. Laender  Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
Altigran S. da Silva  Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 51,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/319950.319962
What is a DOI?

ABSTRACT

In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform the extraction of new objects, we introduce a bottom-up extration strategy and, through experimentation, demonstrate that it works quite effectively with distinct Web sources, even if only a few examples are provided by the user.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amazon.corn Web Site. http://www.amazon.com.
 
2
CDnow Web Site. http://www.cdnow.com.
 
3
DB&LP Site. http://www.informatik.uni-trier, de/~Iey/db/.
 
4
~ravelocity Site. http://www.travelocity, com/.
5
6
7
 
8
9
 
10
 
11
CHAWATHE, S., GARCIA-MOLtNA, H., HAMMER, J., IRELAND, K., PAPAKONSTANTINOU, Y., ULLMAN, J., AND WIDOM, J. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proceedings of IPSJ Conference (Tokyo, Japan, 1994), pp. 7-18.
12
 
13
14
 
15
16
17
18
 
19
 
20
SILVA, E. S. Extraction of Semi-Structured Data Based on Examples. Master's thesis, Departament of Computer Science, Federal University of Minas Gerais, 1999. In portuguese.
 
21
SODERLAND, S. Learning to Extract Text-based Information from the World Wide Web. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining - KDD-97 (Newport Beach, California, 1997), pp. 251-254.
 
22
ZLOOP, M. M. Query-by-Example: A Data Base Language. IBM Systems Journal 16, 4 (1977), 324-343.

CITED BY  11

Collaborative Colleagues:
Berthier Ribeiro-Neto: colleagues
Alberto H. F. Laender: colleagues
Altigran S. da Silva: colleagues