ACM Home Page
Please provide us with feedback. Feedback
Towards generating ETL processes for incremental loading
Full text PdfPdf (376 KB)
Source
ACM International Conference Proceeding Series; Vol. 299 archive
Proceedings of the 2008 international symposium on Database engineering & applications table of contents
Coimbra, Portugal
SESSION: Data management table of contents
Pages 101-110  
Year of Publication: 2008
ISBN:978-1-60558-188-0
Authors
Thomas Jörg  University of Kaiserslautern, Kaiserslautern, Germany
Stefan Deßloch  University of Kaiserslautern, Kaiserslautern, Germany
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 291,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1451940.1451956
What is a DOI?

ABSTRACT

Extract, Transform, and Load (ETL) processes physically integrate data from multiple, heterogeneous sources in a central repository referred to as data warehouse. Physically integrated data gets stale when source data is changed, hence periodic refreshes are required. For efficiency reasons data warehouses are typically refreshed incrementally, i.e. changes are captured at the sources and propagated to the data warehouse on a regular basis. Dedicated ETL processes referred to as incremental load processes are employed to extract changes from the sources, propagate the changes, and refresh the data warehouse incrementally. Changes required in the data warehouse are inferred from changes captured at the sources during change propagation. The creation of incremental load processes is a complex task reserved to trained ETL programmers. In this paper we review existing Change Data Capture (CDC) techniques and discuss limitations of different approaches. We further review existing techniques for refreshing data warehouses. We then present an approach for generating incremental load processes from abstract schema mappings.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
IBM DB2 Data Warehouse Enterprise Edition. www.ibm.com/software/data/db2/dwe/.
 
2
IBM WebSphere DataStage. http://www-306.ibm.com/software/data/integration/datastage/.
 
3
Informatica PowerCenter. http://www.informatica.com/products_services/powercenter/.
 
4
Oracle Database Change Data Capture. http://www.oracle.com/database.
 
5
Oracle Warehouse Builder. http://www.oracle.com/technology/products/warehouse/index.html.
 
6
WebSphere Replication Server (SQL replication ). http://www-306.ibm.com/software/data/integration/replication_server/.
7
8
 
9
M. Bokun and C. Taglienti. Incremental Data Warehouse Updates. DM Review Magazine, May 1998.
10
 
11
S. Dessloch, M. A. Hernández, R. Wisnesky, A. Radwan, and J. Zhou. Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307--1316, 2008.
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
U. Leser and F. Naumann. Informationsintegration. dpunkt.verlag, 2007.
 
20
 
21
 
22
 
23
D. Quass. Maintenance Expressions for Views with Aggregation. In VIEWS, pages 110--118, 1996.
 
24
A. Simitsis. Modeling and managing ETL processes. In VLDB PhD Workshop, 2003.
25
 
26
 
27
A. Simitsis, P. Vassiliadis, M. Terrovitis, and S. Skiadopoulos. Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates. In DaWaK, pages 43--52, 2005.
 
28
29

Collaborative Colleagues:
Thomas Jörg: colleagues
Stefan Deßloch: colleagues