ACM Home Page
Please provide us with feedback. Feedback
Optimizing data warehouse loading procedures for enabling useful-time data warehousing
Full text PdfPdf (1.05 MB)
Source
ACM International Conference Proceeding Series archive
Proceedings of the 2009 International Database Engineering & Applications Symposium table of contents
Cetraro - Calabria, Italy
SESSION: Short papers table of contents
Pages 292-299  
Year of Publication: 2009
ISBN:978-1-60558-402-7
Authors
Ricardo Jorge Santos  University of Coimbra, Coimbra, Portugal
Jorge Bernardino  Polytechnic Institute of Coimbra, Coimbra, Portugal
Sponsors
: BytePress
Concordia University : Concordia University
: ACM
: Universita della Calabria, Rende(CS), Italy
: ICAR-CNR, Rende (CS), Italy
: ACM International Conference Proceeding Series
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 108,   Downloads (12 Months): 116,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1620432.1620464
What is a DOI?

ABSTRACT

The purpose of a data warehouse is to aid decision making. As the real-time enterprise evolves, synchronism between transactional data and data warehouses is redefined. To cope with real-time requirements, the data warehouses must be able to enable continuous data integration, in order to deal with the most recent business data. Traditional data warehouses are unable to support any dynamics in structure and content while they are available for OLAP. Their data is periodically updated because they are unprepared for continuous data integration. For real-time enterprises with needs in decision support while the transactions are occurring, (near) real-time data warehousing seem very promising. In this paper we present a survey on testing today's most used loading techniques and analyze which are the best data loading methods, presenting a methodology for efficiently supporting continuous data integration for data warehouses. To accomplish this, we use techniques such as table structure replication with minimum content and query predicate restrictions for selecting data, to enable loading data in the data warehouse continuously, with minimum impact in query execution time. We demonstrate the efficiency of the method using benchmark TPC-H and executing query workloads while simultaneously performing continuous data integration.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. J. Abadi, D. Carney, et al.: "Aurora: A New Model and Architecture for Data Stream Management", The VLDB Journal, 12(2), pp. 120--139, 2003.
 
2
S. Babu, and J. Widom: "Continuous Queries Over Data Streams", SIGMOD Record 30(3), pp. 109--120.
 
3
T. Binder: Gong User Manual, Tecco Software AG, 2003.
 
4
M. Bouzeghoub, F. Fabret, and M. Matulovic: "Modeling Data Warehouse Refreshment Process as a Workflow Application", Int. Workshop on Design and Management of DW (DMDW), 1999.
 
5
R. M. Bruckner, B. List, and J. Schiefer: "Striving Towards Near Real-Time Data Integration for Data Warehouses", Int. Conf. Data Warehousing and Knowledge Discovery (DAWAK), 2002.
 
6
R. M. Bruckner, and A. M. Tjoa: "Capturing Delays and Valid Times in Data Warehouses -- Towards Timely Consistent Analyses". Journal of Intelligent Inf. Systems (JIIS), 19:2, pp. 169--190, 2002.
 
7
S. Chaudhuri, and U. Dayal: "An Overview of Data Warehousing and OLAP Technology", SIGMOD Record, Volume 26, Number 1, pp. 65--74, 1997.
 
8
W. H. Inmon, R. H. Terdeman, J. Norris-Montanari, and D. Meers: Data Warehousing for E-Business, J. Wiley & Sons, 2001.
 
9
C. Italiano, and J. E. Ferreira: "Synchronization Options for Data Warehouse Designs", IEEE Computer Magazine, 2006.
 
10
Karakasidis, P. Vassiliadis, and E. Pitoura: "ETL Queues for Active Data Warehousing", IQIS'05, 2005.
 
11
R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite: The Data Warehouse Lifecycle Toolkit -- Expert Methods for Designing, Developing and Deploying Data Warehouses, Wiley Computer Pub, 1998.
 
12
R. Kimball, and J. Caserta: The Data Warehouse ETL Toolkit, Wiley Computer Publishing, 2004.
 
13
E. Kuhn: "The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases", Int. Conf. on Very Large Data Bases (VLDB), 2003.
 
14
W. Labio, J. Yang, Y. Cui, H. Garcia-Molina, and J. Widom: "Performance Issues in Incremental Warehouse Maintenance", Int. Conf. on Very Large Data Bases (VLDB), 2000.
 
15
D. Lomet, and J. Gehrke: Special Issue on Data Stream Processing, IEEE Data Eng. Bulletin, 26(1), 2003.
 
16
Oracle Corporation, 2005. www.oracle.com
 
17
T. B. Pedersen: "How is BI Used in Industry?", Int. Conf. on Data Warehousing and Knowledge Discovery (DAWAK), 2004.
 
18
J. F. Roddick, and M. Schrefl: "Towards an Accommodation of Delay in Temporal Active Databases", 11th Australasian Database Conference (ADC), 2000.
 
19
Simitsis, P. Vassiliadis and T. Sellis: "Optimizing ETL Processes in Data Warehouses", Int. Conf. on Data Engineering (ICDE), 2005.
 
20
U. Srivastava, and J. Widom: "Flexible Time Management in Data Stream Systems", Int. Conf. on Principles of Database Systems (PODS), 2004.
 
21
D. Theodoratus, and M. Bouzeghoub: "Data Currency Quality Factors in Data Warehouse Design", Int. Workshop on Design and Management of Data Warehouses (DMDW), 1999.
 
22
TPC-H decision support benchmark, Transaction Processing Council, www.tpc.com.
 
23
P. Vassiliadis, Z. Vagena, S. Skiadopoulos, N. Karayannidis, and T. Sellis: "ARKTOS: Towards the Modelling, Design, Control and Execution of ETL Processes", Inf. Systems, Vol. 26(8), 2001.
 
24
White: "Intelligent Business Strategies: Real-Time Data Warehousing Heats Up", DM Preview, www.dmreview.com/article_sub_cfm?articleld=5570. 2002.
 
25
J. Yang: "Temporal Data Warehousing", Ph.D. Thesis, Dp. Computer Science, Stanford Univ, 2001.
 
26
J. Yang, and J. Widom: "Incremental Computation and Maintenance of Temporal Aggregates", 17th Intern. Conference on Data Engineering (ICDE), 2001.
 
27
J. Yang, and J. Widom: "Temporal View Self-Maintenance", 7th Int. Conf. Extending Database Technology (EDBT), 2001.
 
28
T. Zurek, and K. Kreplin: "SAP Business Information Warehouse -- From Data Warehousing to an E-Business Platform", 17 Int. Conf. on Data Engineering (ICDE), 2001.
 
29
D. Burleson: Oracle data load (import, SQL*Loader) speed tips, Burleson Consulting, http://www.dbaoracle.com/oracle_tips_load_speed.htm. 2006.
 
30
Oracle Corporation: http://www.oracle.com/technology/products/database/utilities/index. html. 2008.
 
31
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, S. B. Zdonik: "Towards a Streaming SQL Standard", Int. Conf. Very Large Data Bases (VLDB), 1(2): 1379--1390, 2008.
 
32
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N. Frantzell: "Meshing Streaming Updates with Persistent Data in an Active Data Warehouse", IEEE Transactions on Knowl. Data Eng., 20(7): 976--991, 2008.
 
33
H. Agrawal, G. Chafle, S. Goyal, S. Mittal, S. Mukherjea: "An Enhanced Extract-Transform-Load System for Migrating Data in Telecom Billing", Int. Conference on Data Engineering (ICDE), pp. 1277--1286, 2008.
 
34
A. Simitsis, P. Vassiliadis: "A method for the mapping of conceptual designs to logical blueprints for ETL processes", Decision Support Systems (45) 22--40, 2008.
 
35
P. Vassiliadis, A. Simitsis, P. Georgantas, M. Terrovitis, S. Skiadopoulos: "A generic and customizable framework for the design of ETL scenarios", Inform. Systems, 30(7): 492--525, 2005.
 
36
IBM, IBM Data warehouse manager, available at http://www-3.ibm.com/software/data/db2/data warehouse/.
 
37
Informatica, Power Center, available at http://www.informatica.com/products/data+integration/powercenter/default. htm.
 
38
Microsoft, Data transformation services, available at http://www.microsoft.com.
 
39
Oracle Corporation, "Oracle warehouse builder product page", available at http://otn.oracle.com/products/warehouse/content.html