ACM Home Page
Please provide us with feedback. Feedback
Efficient resumption of interrupted warehouse loads
Full text PdfPdf (219 KB)
Source International Conference on Management of Data archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data table of contents
Dallas, Texas, United States
Pages: 46 - 57  
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
Authors
Wilburt Juan Labio  Gigabeat, Inc. Palo Alto CA
Janet L. Wiener  Compaq SRC, Palo Alto, CA
Hector Garcia-Molina  Stanford University
Vlad Gorelik  Sagent Technologies
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 42,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/342009.335379
What is a DOI?

ABSTRACT

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
F. Carino. High-performance, parallel warehouse servers and large-scale applications, Oct. 1997. Talk about Teradata given in Stanford Database Seminar.
 
4
TPC Committee. Transaction Processing Council. Available at: http://www.tpc.org/.
 
5
 
6
Informatica. Powermart 4.0 overview. Available at: http://www.informatica.com/pm_tech_over.html.
 
7
W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik. Resumption algorithms. Technical report, Stanford University, 1998. Available at http://wwwdb. stanford.edu/pub/papers/resume.ps.
8
 
9
R. Reinsch and M. Zimowski. Method for Restarting a Long- Running, Fault-Tolerant Operation in a Transaction-Oriented Data Base System Without Burdening the System Log. U.S. Patent 4,868,744, IBM, 1989.
 
10
Sagent Technologies. Personal correspondence with customers.
 
11
 
12

CITED BY  11

Collaborative Colleagues:
Wilburt Juan Labio: colleagues
Janet L. Wiener: colleagues
Hector Garcia-Molina: colleagues
Vlad Gorelik: colleagues