|
ABSTRACT
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the problem of the definition of ETL activities and provide formal foundations for their conceptual representation. The proposed conceptual model is (a) customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project; (b) enriched with a 'palette' of a set of frequently used ETL activities, like the assignment of surrogate keys, the check for null values, etc; and (c) constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns for ETL activities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ardent Software. DataStage Suite. http://www.ardentsoftware.com/
|
| |
2
|
M. Bouzeghoub, F. Fabret, M. Matulovic. Modeling Data Warehouse Refreshment Process as a Workflow Application. In Proc. DMDW'99 (Heidelberg, Germany, 1999).
|
| |
3
|
V. Borkar, K. Deshmuk, S. Sarawagi. Automatically Extracting Structure from Free Text Addresses. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.
|
| |
4
|
|
| |
5
|
|
| |
6
|
D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati. A principled approach to data integration and reconciliation in data warehousing. In Proc. DMDW'99, (Heidelberg, Germany, 1999).
|
| |
7
|
DataMirror Corporation. Transformation Server. http://www.datamirror.com
|
| |
8
|
M. Demarest. The politics of data warehousing. http://www.hevanet.com/demarest/marc/dwpol.html
|
| |
9
|
Evolutionary Technologies Intl. ETI*EXTRACT. http://www.eti.com/
|
 |
10
|
Helena Galhardas , Daniela Florescu , Dennis Shasha , Eric Simon, AJAX: an extensible data cleaning tool, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.590, May 15-18, 2000, Dallas, Texas, United States
|
| |
11
|
M. Golfarelli, D. Maio, S. Rizzi. The Dimensional Fact Model: a Conceptual Model for Data Warehouses. Invited Paper, International Journal of Cooperative Information Systems, 7, 2&3, 1998.
|
 |
12
|
|
| |
13
|
B. Husemann, J. Lechtenborger, G. Vossen. Conceptual data warehouse modeling. In Proc. DMDW (Stockholm, Sweden, 2000), pp. 6.1--6.11.
|
| |
14
|
B. Inmon. The Data Warehouse Budget. DM Review Magazine, January 1997. www.dmreview.com/master.cfm?NavID=55&EdID=1315
|
| |
15
|
|
| |
16
|
M. Jarke, M.A. Jeusfeld, C. Quix, P. Vassiliadis: Architecture and quality in data warehouses: An extended repository approach. Information Systems, 24, 3, 1999, pp. 229--253.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Ralph Kimball , Laura Reeves , Warren Thornthwaite , Margy Ross , Warren Thornwaite, The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom, John Wiley & Sons, Inc., New York, NY, 1998
|
 |
20
|
Wilburt Juan Labio , Janet L. Wiener , Hector Garcia-Molina , Vlad Gorelik, Efficient resumption of interrupted warehouse loads, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.46-57, May 15-18, 2000, Dallas, Texas, United States
|
| |
21
|
Microsoft Corp. MS Data Transformation Services. www.microsoft.com/sq
|
| |
22
|
D.L. Moody, M.A.R. Kortink: From enterprise models to dimensional models: a methodology for data warehouse and data mart design. In Proc. DMDW (Stockholm, Sweden, June 2000).
|
| |
23
|
A. Monge. Matching Algorithms Within a Duplicate Detection System. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.
|
| |
24
|
|
| |
25
|
Oracle Corp. Oracle9i™ Warehouse Builder User's Guide, Release 9.0.2. November 2001.
|
| |
26
|
E. Rahm, H. Do. Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.
|
| |
27
|
|
| |
28
|
|
| |
29
|
C. Shilakes, J. Tylman. Enterprise Information Portals. Enterprise Software Team. http://www.sagemaker.com/company/downloads/eip/ indepth.pdf
|
 |
30
|
Nectaria Tryfona , Frank Busborg , Jens G. Borch Christiansen, starER: a conceptual model for data warehouse design, Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, p.3-8, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319757.319776]
|
| |
31
|
|
| |
32
|
A. Tsois. MAC: Conceptual data modeling for OLAP. In Proc. DMDW (Interlaken, Switzerland, 2001
|
| |
33
|
P. Vassiliadis. Gulliver in the land of data warehousing: practical experiences and observations of a researcher. In Proc. DMDW (Stockholm, Sweden, 2000), pp. 12.1--12.16.
|
| |
34
|
P. Vassiliadis, A. Simitsis, S. Skiadopoulos. Modeling ETL activities as graphs. In Proc. DMDW (Toronto, Canada, May 2002), pp. 52--61.
|
| |
35
|
|
| |
36
|
Panos Vassiliadis , Zografoula Vagena , Spiros Skiadopoulos , Nikos Karayannidis , Timos Sellis, Arktos: towards the modeling, design, control and execution of ETL processes, Information Systems, v.26 n.8, p.537-561, December 2001
[doi> 10.1016/S0306-4379(01)00039-4]
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stefano Rizzi , Alberto Abelló , Jens Lechtenbörger , Juan Trujillo, Research in data warehouse modeling and design: dead or alive?, Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, November 10-10, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Hong-Ding Wang , Yun-Hai Tong , Shao-Hua Tan , Shi-Wei Tang , Dong-Qing Yang , Guo-Hui Sun, An adaptive approach to schema classification for data warehouse modeling, Journal of Computer Science and Technology, v.22 n.2, p.252-260, March 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|