| Efficient resumption of interrupted warehouse loads |
| Full text |
Pdf
(219 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data
table of contents
Dallas, Texas, United States
Pages: 46 - 57
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 42, Citation Count: 11
|
|
|
ABSTRACT
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Philip A. Bernstein , Meichun Hsu , Bruce Mann, Implementing recoverable requests using queues, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.112-122, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
2
|
|
| |
3
|
F. Carino. High-performance, parallel warehouse servers and large-scale applications, Oct. 1997. Talk about Teradata given in Stanford Database Seminar.
|
| |
4
|
TPC Committee. Transaction Processing Council. Available at: http://www.tpc.org/.
|
| |
5
|
|
| |
6
|
Informatica. Powermart 4.0 overview. Available at: http://www.informatica.com/pm_tech_over.html.
|
| |
7
|
W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik. Resumption algorithms. Technical report, Stanford University, 1998. Available at http://wwwdb. stanford.edu/pub/papers/resume.ps.
|
 |
8
|
|
| |
9
|
R. Reinsch and M. Zimowski. Method for Restarting a Long- Running, Fault-Tolerant Operation in a Transaction-Oriented Data Base System Without Burdening the System Log. U.S. Patent 4,868,744, IBM, 1989.
|
| |
10
|
Sagent Technologies. Personal correspondence with customers.
|
| |
11
|
|
| |
12
|
|
CITED BY 11
|
|
|
|
|
Panos Vassiliadis , Alkis Simitsis , Spiros Skiadopoulos, Conceptual modeling for ETL processes, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, p.14-21, November 08-08, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|