| Bootstrapping for example-based data extraction |
| Full text |
Pdf
(2.59 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the tenth international conference on Information and knowledge management
table of contents
Atlanta, Georgia, USA
Session: String Match and Text Extraction
table of contents
Pages: 371 - 378
Year of Publication: 2001
ISBN:1-58113-436-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 30, Citation Count: 7
|
|
|
ABSTRACT
The effortless generation of wrappers for Web data sources is a crucial task if proper access to the huge amount of semi-structured data on the Web is to be granted. In particular, the development of strategies for wrapper generation based on user-given examples is currently one of the most promising research directions in Web data extraction. In this paper we show how to use a pre-existing data repository to automatically generate examples and allow full automated example-based data extraction. To demonstrate the feasibility of our approach we provide a number of results obtained from experiments we carried out and discuss how our ideas can be used to improve extraction rates and for providing resilience and adaptiveness for example-based generated wrappers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
D. W. Embley , D. M. Campbell , Y. S. Jiang , S. W. Liddle , D. W. Lonsdale , Y.---K. Ng , R. D. Smith, Conceptual-model-based data extraction from multiple-record Web pages, Data & Knowledge Engineering, v.31 n.3, p.227-251, Nov. 1999
[doi> 10.1016/S0169-023X(99)00027-0]
|
| |
4
|
GOLGHER, P. B. Bootstrapping for Example-based Data Extraction. Master's thesis, Deptartment of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2001.
|
| |
5
|
|
| |
6
|
KNOBLOCK, C. A., LERMAN, K., MINTON, S., AND MUSLEA, I. Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. IEEE Data Engineering Bulletin 23, 4 (2000), 3341.
|
| |
7
|
|
 |
8
|
Berthier Ribeiro-Neto , Alberto H. F. Laender , Altigran S. da Silva, Extracting semi-structured data through examples, Proceedings of the eighth international conference on Information and knowledge management, p.94-101, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.319962]
|
| |
9
|
|
CITED BY 7
|
|
Pável P. Calado , Marcos A. Gonçalves , Edward A. Fox , Berthier Ribeiro-Neto , Alberto H. F. Laender , Altigran S. da Silva , Davi C. Reis , Pablo A. Roberto , Monique V. Vieira , Juliano P. Lage, The Web-DL environment for building digital libraries from the Web, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
|
|
Alberto H. F. Laender , Altigran S. da Silva , Paolo B. Golgher , Berthier Ribeiro-Neto , Irna M. R. Evangelista-Filha , Karine V. Magalhães, The Debye Environment for Web Data Management, IEEE Internet Computing, v.6 n.4, p.60-69, July 2002
|
|
|
Alberto H. F. Laender , Altigran S. da Silva , Paolo B. Golgher , Berthier Ribeiro-Neto , Irna M. R. Evangelista-Filha , Karine V. Magalhães, The Debye Environment for Web Data Management, IEEE Internet Computing, v.6 n.4, p.60-69, July 2002
|
|
|
|
|
|
|
|
|
Juliano Palmieri Lage , Altigran S. da Silva , Paulo B. Golgher , Alberto H. F. Laender, Collecting hidden weeb pages for data extraction, Proceedings of the 4th international workshop on Web information and data management, November 08-08, 2002, McLean, Virginia, USA
|
|
|
|
|