ACM Home Page
Please provide us with feedback. Feedback
Automatic repairing of web wrappers
Full text PdfPdf (929 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 3rd international workshop on Web information and data management table of contents
Atlanta, Georgia, USA
Session: Web Information Management table of contents
Pages: 24 - 30  
Year of Publication: 2001
ISBN:1-58113-444-4
Author
Boris Chidlovskii  Xerox Research Centre Europe, Grenoble Laboratory, Meylan, France
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 27,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502932.502938
What is a DOI?

ABSTRACT

We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web wrappers use "hooks'' or "landmarks'' to find and extract relevant information from Web pages and such wrappers often become inoperable when the page structure is changed. The solution we propose in this paper extends conventional forward wrappers with alternative classifiers built using content features of extracted information and wrappers processing pages backward. We report some preliminary results of the information extraction recovery and wrapper repairing for a set of real Web provider changes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Denis Breddet and Bruno Roustant. Java IWrap: Wrapper Induction by Grammar Learning. Master's thesis, ENSIMAG, Grenoble, France, 2000.
 
2
 
3
 
4
M. Harries and K. Horn. Learning stable concepts in domains with hidden changes in context. In Learning in context-sensitive domains Workshop, 13th International Conference on Machine Learning,, 1996.
 
5
C.-N. Hsu and C.-C. Chang. Finite-state transducers for semi-structured text mining. In Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
 
6
Craig A. Knoblock, Kristina Lerman, Steven Minton, and Ion Muslea. Accurately and reliably extracting data from the web: A machine learning approach. IEEE Data Engineering Bulletin, 23(4):33-41, 2000.
 
7
 
8
 
9
10
 
11