ACM Home Page
Please provide us with feedback. Feedback
Stream warehousing with DataDepot
Full text PdfPdf (509 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Industrial session 1: data warehousing table of contents
Pages 847-854  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Lukasz Golab  AT&T Laboratories - Research, Florham Park, NJ, USA
Theodore Johnson  AT&T Laboratories - Research, Florham Park, NJ, USA
J. Spencer Seidel  AT&T Laboratories - Research, Florham Park, NJ, USA
Vladislav Shkapenyuk  AT&T Laboratories - Research, Florham Park, NJ, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 66,   Downloads (12 Months): 210,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559934
What is a DOI?

ABSTRACT

We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is a streaming data warehouse designed to automate the ingestion of streaming data from a wide variety of sources and to maintain complex materialized views over these sources. As a streaming warehouse, DataDepot is similar to Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, and real-time response. However, as a data warehouse, DataDepot is designed to store tens to hundreds of terabytes of historical data, allow time windows measured in years or decades, and allow both real-time queries on recent data and deep analyses on historical data. In this paper we discuss the DataDepot architecture, with an emphasis on several of its novel and critical features. DataDepot is currently being used for five very large warehousing projects within AT&T; one of these warehouses ingests 500 Mbytes per minute (and is growing). We use these installations to illustrate streaming warehouse use and behavior, and design choices made in developing DataDepot. We conclude with a discussion of DataDepot applications and the efficacy of some optimizations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
 
6
7
8
 
9
 
10
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N.-E. Frantzell. Supporting Streaming Updates in an Active Data Warehouse. ICDE 2007, 476--485.
11
 
12


Collaborative Colleagues:
Lukasz Golab: colleagues
Theodore Johnson: colleagues
J. Spencer Seidel: colleagues
Vladislav Shkapenyuk: colleagues