| PADS: an end-to-end system for processing ad hoc data |
| Full text |
Pdf
(137 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
table of contents
Chicago, IL, USA
DEMONSTRATION SESSION: Group A
table of contents
Pages: 727 - 729
Year of Publication: 2006
ISBN:1-59593-434-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 38, Citation Count: 1
|
|
|
ABSTRACT
Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Galax user manual. http://www.galaxquery.org.
|
| |
2
|
PADS user manual. http://www.padsproj.org/.
|
 |
3
|
Chuck Cranor , Yuan Gao , Theodore Johnson , Vlaidslav Shkapenyuk , Oliver Spatscheck, Gigascope: high performance network monitoring with an SQL interface, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564777]
|
| |
4
|
M. Fernández, K. Fisher, R. Gruber, and Y. Mandelbaum. PADX: Querying large-scale ad hoc data with XQuery. In PLAN-X, 2006.
|
 |
5
|
|
 |
6
|
Kathleen Fisher , Yitzhak Mandelbaum , David Walker, The next 700 data description languages, Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.2-15, January 11-13, 2006, Charleston, South Carolina, USA
|
 |
7
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509966]
|
| |
8
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In VLDB, 2002.
|
| |
9
|
|
|