|
ABSTRACT
Extract, Transform, and Load (ETL) processes physically integrate data from multiple, heterogeneous sources in a central repository referred to as data warehouse. Physically integrated data gets stale when source data is changed, hence periodic refreshes are required. For efficiency reasons data warehouses are typically refreshed incrementally, i.e. changes are captured at the sources and propagated to the data warehouse on a regular basis. Dedicated ETL processes referred to as incremental load processes are employed to extract changes from the sources, propagate the changes, and refresh the data warehouse incrementally. Changes required in the data warehouse are inferred from changes captured at the sources during change propagation. The creation of incremental load processes is a complex task reserved to trained ETL programmers. In this paper we review existing Change Data Capture (CDC) techniques and discuss limitations of different approaches. We further review existing techniques for refreshing data warehouses. We then present an approach for generating incremental load processes from abstract schema mappings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
IBM DB2 Data Warehouse Enterprise Edition. www.ibm.com/software/data/db2/dwe/.
|
| |
2
|
IBM WebSphere DataStage. http://www-306.ibm.com/software/data/integration/datastage/.
|
| |
3
|
Informatica PowerCenter. http://www.informatica.com/products_services/powercenter/.
|
| |
4
|
Oracle Database Change Data Capture. http://www.oracle.com/database.
|
| |
5
|
Oracle Warehouse Builder. http://www.oracle.com/technology/products/warehouse/index.html.
|
| |
6
|
WebSphere Replication Server (SQL replication ). http://www-306.ibm.com/software/data/integration/replication_server/.
|
 |
7
|
D. Agrawal , A. El Abbadi , A. Singh , T. Yurek, Efficient view maintenance at data warehouses, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.417-427, May 11-15, 1997, Tucson, Arizona, United States
|
 |
8
|
|
| |
9
|
M. Bokun and C. Taglienti. Incremental Data Warehouse Updates. DM Review Magazine, May 1998.
|
 |
10
|
|
| |
11
|
S. Dessloch, M. A. Hernández, R. Wisnesky, A. Radwan, and J. Zhou. Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307--1316, 2008.
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
U. Leser and F. Naumann. Informationsintegration. dpunkt.verlag, 2007.
|
| |
20
|
|
| |
21
|
Themistoklis Palpanas , Richard Sidle , Roberta Cochrane , Hamid Pirahesh, Incremental maintenance for non-distributive aggregate functions, Proceedings of the 28th international conference on Very Large Data Bases, p.802-813, August 20-23, 2002, Hong Kong, China
|
| |
22
|
|
| |
23
|
D. Quass. Maintenance Expressions for Views with Aggregation. In VIEWS, pages 110--118, 1996.
|
| |
24
|
A. Simitsis. Modeling and managing ETL processes. In VLDB PhD Workshop, 2003.
|
 |
25
|
|
| |
26
|
|
| |
27
|
A. Simitsis, P. Vassiliadis, M. Terrovitis, and S. Skiadopoulos. Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates. In DaWaK, pages 43--52, 2005.
|
| |
28
|
|
 |
29
|
Panos Vassiliadis , Alkis Simitsis , Spiros Skiadopoulos, Conceptual modeling for ETL processes, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, p.14-21, November 08-08, 2002, McLean, Virginia, USA
[doi> 10.1145/583890.583893]
|
|