| Extracting semi-structured data through examples |
| Full text |
Pdf
(1.14 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 94 - 101
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
Berthier Ribeiro-Neto
|
Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
|
|
Alberto H. F. Laender
|
Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
|
|
Altigran S. da Silva
|
Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte MG, Brazil
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 51, Citation Count: 11
|
|
|
ABSTRACT
In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform the extraction of new objects, we introduce a bottom-up extration strategy and, through experimentation, demonstrate that it works quite effectively with distinct Web sources, even if only a few examples are provided by the user.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amazon.corn Web Site. http://www.amazon.com.
|
| |
2
|
CDnow Web Site. http://www.cdnow.com.
|
| |
3
|
DB&LP Site. http://www.informatik.uni-trier, de/~Iey/db/.
|
| |
4
|
~ravelocity Site. http://www.travelocity, com/.
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
CHAWATHE, S., GARCIA-MOLtNA, H., HAMMER, J., IRELAND, K., PAPAKONSTANTINOU, Y., ULLMAN, J., AND WIDOM, J. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proceedings of IPSJ Conference (Tokyo, Japan, 1994), pp. 7-18.
|
 |
12
|
|
| |
13
|
David W. Embley , Douglas M. Campbell , Y. S. Jiang , Stephen W. Liddle , Yiu-Kai Ng , Dallan Quass , Randy D. Smith, A Conceptual-Modeling Approach to Extracting Data from the Web, Proceedings of the 17th International Conference on Conceptual Modeling, p.78-91, November 16-19, 1998
|
 |
14
|
David W. Embley , Douglas M. Campbell , Randy D. Smith , Stephen W. Liddle, Ontology-based extraction and structuring of information from data-rich unstructured documents, Proceedings of the seventh international conference on Information and knowledge management, p.52-59, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288641]
|
| |
15
|
|
 |
16
|
Joachim Hammer , Héctor García-Molina , Svetlozar Nestorov , Ramana Yerneni , Marcus Breunig , Vasilis Vassalos, Template-based wrappers in the TSIMMIS system, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.532-535, May 11-15, 1997, Tucson, Arizona, United States
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
SILVA, E. S. Extraction of Semi-Structured Data Based on Examples. Master's thesis, Departament of Computer Science, Federal University of Minas Gerais, 1999. In portuguese.
|
| |
21
|
SODERLAND, S. Learning to Extract Text-based Information from the World Wide Web. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining - KDD-97 (Newport Beach, California, 1997), pp. 251-254.
|
| |
22
|
ZLOOP, M. M. Query-by-Example: A Data Base Language. IBM Systems Journal 16, 4 (1977), 324-343.
|
CITED BY 11
|
|
Hasan Davulcu , Guizhen Yang , Michael Kifer , I. V. Ramakrishnan, Computational aspects of resilient data extraction from semistructured sources (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.136-144, May 15-18, 2000, Dallas, Texas, United States
|
|
|
|
|
|
Paulo B. Golgher , Altigran S. da Silva , Alberto H. F. Laender , Berthier Ribeiro-Neto, Bootstrapping for example-based data extraction, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|