| Collecting hidden weeb pages for data extraction |
| Full text |
Pdf
(258 KB)
|
| Source
|
Workshop On Web Information And Data Management
archive
Proceedings of the 4th international workshop on Web information and data management
table of contents
McLean, Virginia, USA
SESSION: Web services and performance evaluation
table of contents
Pages: 69 - 75
Year of Publication: 2002
ISBN:1-58113-593-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 37, Citation Count: 2
|
|
|
ABSTRACT
As the Web grows, more and more data has become available under dynamic forms of publication, such as a legacy database accessed by an HTML form (the so called Hidden Web). In situations such as this, integration of this data relies more and more on the fast generation of page fetching agents. As a result, there is an increasing need for tools that can help the user to generate such agents. In this paper, we describe an approach to automatically generating agents to collect hidden Web pages that uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some regularities that can be found among Web sites. To demonstrate the effectiveness of our approach, we discuss the results of a number of experiments carried out with sites from different domains. We also dicuss how such regularities among sites can be formalized.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. K. Bergman. The deep Web: Surfacing hidden value. White Paper, Bright Planet, 2000.
|
 |
2
|
Hasan Davulcu , Juliana Freire , Michael Kifer , I. V. Ramakrishnan, A layered architecture for querying dynamic Web content, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.491-502, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
Robert B. Doorenbos , Oren Etzioni , Daniel S. Weld, A scalable comparison-shopping agent for the World-Wide Web, Proceedings of the first international conference on Autonomous agents, p.39-48, February 05-08, 1997, Marina del Rey, California, United States
[doi> 10.1145/267658.267666]
|
| |
4
|
D. W. Embley , D. M. Campbell , Y. S. Jiang , S. W. Liddle , D. W. Lonsdale , Y.---K. Ng , R. D. Smith, Conceptual-model-based data extraction from multiple-record Web pages, Data & Knowledge Engineering, v.31 n.3, p.227-251, Nov. 1999
[doi> 10.1016/S0169-023X(99)00027-0]
|
 |
5
|
|
 |
6
|
Paulo B. Golgher , Altigran S. da Silva , Alberto H. F. Laender , Berthier Ribeiro-Neto, Bootstrapping for example-based data extraction, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502648]
|
| |
7
|
|
| |
8
|
|
| |
9
|
Alberto H. F. Laender , Altigran S. da Silva , Paolo B. Golgher , Berthier Ribeiro-Neto , Irna M. R. Evangelista-Filha , Karine V. Magalhães, The Debye Environment for Web Data Management, IEEE Internet Computing, v.6 n.4, p.60-69, July 2002
[doi> 10.1109/MIC.2002.1020327]
|
| |
10
|
|
| |
11
|
S. Lawrence and C. Giles. Searching the World-Wide Web. Science, 280(4):98--100, 1998.
|
| |
12
|
|
CITED BY 2
|
|
|
|
|
Valter Crescenzi , Giansalvatore Mecca , Paolo Merialdo , Paolo Missier, An automatic data grabber for large web sites, Proceedings of the Thirtieth international conference on Very large data bases, p.1321-1324, August 31-September 03, 2004, Toronto, Canada
|
|