ACM Home Page
Please provide us with feedback. Feedback
Collecting hidden weeb pages for data extraction
Full text PdfPdf (258 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 4th international workshop on Web information and data management table of contents
McLean, Virginia, USA
SESSION: Web services and performance evaluation table of contents
Pages: 69 - 75  
Year of Publication: 2002
ISBN:1-58113-593-9
Authors
Juliano Palmieri Lage  Federal University of Minas Gerais, Belo Horizonte MG Brazil
Altigran S. da Silva  Federal University of Amazonas, Manaus AM Brazil
Paulo B. Golgher  Akwan Information Technologies, Belo Horizonte MG Brazil
Alberto H. F. Laender  Federal University of Minas Gerais, Belo Horizonte MG Brazil
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 37,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584931.584946
What is a DOI?

ABSTRACT

As the Web grows, more and more data has become available under dynamic forms of publication, such as a legacy database accessed by an HTML form (the so called Hidden Web). In situations such as this, integration of this data relies more and more on the fast generation of page fetching agents. As a result, there is an increasing need for tools that can help the user to generate such agents. In this paper, we describe an approach to automatically generating agents to collect hidden Web pages that uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some regularities that can be found among Web sites. To demonstrate the effectiveness of our approach, we discuss the results of a number of experiments carried out with sites from different domains. We also dicuss how such regularities among sites can be formalized.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. K. Bergman. The deep Web: Surfacing hidden value. White Paper, Bright Planet, 2000.
2
3
 
4
5
6
 
7
 
8
 
9
 
10
 
11
S. Lawrence and C. Giles. Searching the World-Wide Web. Science, 280(4):98--100, 1998.
 
12


Collaborative Colleagues:
Juliano Palmieri Lage: colleagues
Altigran S. da Silva: colleagues
Paulo B. Golgher: colleagues
Alberto H. F. Laender: colleagues