ACM Home Page
Please provide us with feedback. Feedback
PADS: an end-to-end system for processing ad hoc data
Full text PdfPdf (137 KB)
Source International Conference on Management of Data archive
Proceedings of the 2006 ACM SIGMOD international conference on Management of data table of contents
Chicago, IL, USA
DEMONSTRATION SESSION: Group A table of contents
Pages: 727 - 729  
Year of Publication: 2006
ISBN:1-59593-434-0
Authors
Mark Daly  Princeton University
Yitzhak Mandelbaum  Princeton University
David Walker  Princeton University
Mary Fernández  AT&T Labs Research
Kathleen Fisher  AT&T Labs Research
Robert Gruber  Google
Xuan Zheng  University of Michigan, Ann Arbor, MI
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 38,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1142473.1142568
What is a DOI?

ABSTRACT

Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Galax user manual. http://www.galaxquery.org.
 
2
PADS user manual. http://www.padsproj.org/.
3
 
4
M. Fernández, K. Fisher, R. Gruber, and Y. Mandelbaum. PADX: Querying large-scale ad hoc data with XQuery. In PLAN-X, 2006.
5
6
7
 
8
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In VLDB, 2002.
 
9


Collaborative Colleagues:
Mark Daly: colleagues
Yitzhak Mandelbaum: colleagues
David Walker: colleagues
Mary Fernández: colleagues
Kathleen Fisher: colleagues
Robert Gruber: colleagues
Xuan Zheng: colleagues