ACM Home Page
Please provide us with feedback. Feedback
Siphon++: a hidden-webcrawler for keyword-based interfaces
Full text PdfPdf (345 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
POSTER SESSION: Poster session 1/information retrieval table of contents
Pages 1361-1362  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Karane Vieira  UFAM, Manaus, Brazil
Luciano Barbosa  University of Utah, Salt Lake City, USA
Juliana Freire  University of Utah, Salt Lake City, USA
Altigran Silva  UFAM, Manaus, Brazil
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 73,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458279
What is a DOI?

ABSTRACT

The hidden Web consists of data that is generally hidden behind form interfaces, and as such, it is out of reach for traditional search engines. With the goal of leveraging the high-quality information in this largely unexplored portion of the Web, in this paper, we propose a new strategy for automatically retrieving data hidden behind keyword-based form interfaces. Unlike previous approaches to this problem, our strategy adapts the query generation and selection by detecting features of the index. We describe an extensive experimental evaluation which shows that: our strategy is able to derive appropriate queries to obtain high coverage while, at the same time, avoiding the retrieval of redundant data; and it obtains higher coverage and is more efficient approaches that use a fixed strategy for query generation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Luciano Barbosa and Juliana Freire. Siphoning Hidden-Web Data through Keyword-Based Interfaces. In SBBD, pages 309--321, 2004.
2

Collaborative Colleagues:
Karane Vieira: colleagues
Luciano Barbosa: colleagues
Juliana Freire: colleagues
Altigran Silva: colleagues