| Automatic repairing of web wrappers |
| Full text |
Pdf
(929 KB)
|
| Source
|
Workshop On Web Information And Data Management
archive
Proceedings of the 3rd international workshop on Web information and data management
table of contents
Atlanta, Georgia, USA
Session: Web Information Management
table of contents
Pages: 24 - 30
Year of Publication: 2001
ISBN:1-58113-444-4
|
|
Author
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 27, Citation Count: 6
|
|
|
ABSTRACT
We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web wrappers use "hooks'' or "landmarks'' to find and extract relevant information from Web pages and such wrappers often become inoperable when the page structure is changed. The solution we propose in this paper extends conventional forward wrappers with alternative classifiers built using content features of extracted information and wrappers processing pages backward. We report some preliminary results of the information extraction recovery and wrapper repairing for a set of real Web provider changes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Denis Breddet and Bruno Roustant. Java IWrap: Wrapper Induction by Grammar Learning. Master's thesis, ENSIMAG, Grenoble, France, 2000.
|
| |
2
|
|
| |
3
|
|
| |
4
|
M. Harries and K. Horn. Learning stable concepts in domains with hidden changes in context. In Learning in context-sensitive domains Workshop, 13th International Conference on Machine Learning,, 1996.
|
| |
5
|
C.-N. Hsu and C.-C. Chang. Finite-state transducers for semi-structured text mining. In Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
|
| |
6
|
Craig A. Knoblock, Kristina Lerman, Steven Minton, and Ion Muslea. Accurately and reliably extracting data from the web: A machine learning approach. IEEE Data Engineering Bulletin, 23(4):33-41, 2000.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
CITED BY 6
|
|
|
|
|
|
|
|
Robert McCann , Bedoor AlShebli , Quoc Le , Hoa Nguyen , Long Vu , AnHai Doan, Mapping maintenance for data integration systems, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|