ACM Home Page
Please provide us with feedback. Feedback
HDSampler: revealing data behind web form interfaces
Full text PdfPdf (564 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
DEMONSTRATION SESSION: Demonstration session: group D table of contents
Pages 1131-1134  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Anirban Maiti  University of Texas at Arlington, Arlington, TX, USA
Arjun Dasgupta  University of Texas at Arlington, Arlington, TX, USA
Nan Zhang  George Washington University, Washington, D.C., DC, USA
Gautam Das  University of Texas at Arlington, Arlington, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 93,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1560001
What is a DOI?

ABSTRACT

A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base - a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.



Collaborative Colleagues:
Anirban Maiti: colleagues
Arjun Dasgupta: colleagues
Nan Zhang: colleagues
Gautam Das: colleagues