| Stream warehousing with DataDepot |
| Full text |
Pdf
(509 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 35th SIGMOD international conference on Management of data
table of contents
Providence, Rhode Island, USA
SESSION: Industrial session 1: data warehousing
table of contents
Pages 847-854
Year of Publication: 2009
ISBN:978-1-60558-551-2
|
|
Authors
|
|
Lukasz Golab
|
AT&T Laboratories - Research, Florham Park, NJ, USA
|
|
Theodore Johnson
|
AT&T Laboratories - Research, Florham Park, NJ, USA
|
|
J. Spencer Seidel
|
AT&T Laboratories - Research, Florham Park, NJ, USA
|
|
Vladislav Shkapenyuk
|
AT&T Laboratories - Research, Florham Park, NJ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 66, Downloads (12 Months): 210, Citation Count: 1
|
|
|
ABSTRACT
We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is a streaming data warehouse designed to automate the ingestion of streaming data from a wide variety of sources and to maintain complex materialized views over these sources. As a streaming warehouse, DataDepot is similar to Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, and real-time response. However, as a data warehouse, DataDepot is designed to store tens to hundreds of terabytes of historical data, allow time windows measured in years or decades, and allow both real-time queries on recent data and deep analyses on historical data. In this paper we discuss the DataDepot architecture, with an emphasis on several of its novel and critical features. DataDepot is currently being used for five very large warehousing projects within AT&T; one of these warehouses ingests 500 Mbytes per minute (and is growing). We use these installations to illustrate streaming warehouse use and behavior, and design choices made in developing DataDepot. We conclude with a discussion of DataDepot applications and the efficacy of some optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Graham Cormode , Lukasz Golab , Korn Flip , Andrew McGregor , Divesh Srivastava , Xi Zhang, Estimating the confidence of conditional functional dependencies, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
[doi> 10.1145/1559845.1559895]
|
 |
3
|
|
| |
4
|
Nathan Folkert , Abhinav Gupta , Andrew Witkowski , Sankar Subramanian , Srikanth Bellamkonda , Shrikanth Shankar , Tolga Bozkaya , Lei Sheng, Optimizing refresh of a set of materialized views, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N.-E. Frantzell. Supporting Streaming Updates in an Active Data Warehouse. ICDE 2007, 476--485.
|
 |
11
|
|
| |
12
|
|
CITED BY
|
|
Mohammad Hossein Bateni , Lukasz Golab , Mohammad Taghi Hajiaghayi , Howard Karloff, Scheduling to minimize staleness and stretch in real-time data warehouses, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, August 11-13, 2009, Calgary, AB, Canada
|
|