|
ABSTRACT
Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today's BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a oneway, batch process, usually implemented by extract-transform-load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Anthimopoulos, B. Gatos, I. Pratikakis. Multiresolution text detection in video frames. In VISAPP (2), pp. 161--166, 2007.
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
Qiming Chen , Meichun Hsu, Data-Continuous SQL Process Model, Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:, November 09-14, 2008, Monterrey, Mexico
[doi> 10.1007/978-3-540-88871-0_14]
|
| |
6
|
S. Chen, L. Bao, P. Chen. OptBPEL: A Tool for Performance Optimization of BPEL Process. In Software Composition, pp. 141--148, 2008.
|
| |
7
|
L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. Non-Functional Requirements in Software Engineering. Kluwer Academic Publishing, 1999.
|
 |
8
|
Nilesh N. Dalvi , Sumit K. Sanghai , Prasan Roy , S. Sudarshan, Pipelining in multi-query optimization, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.59-70, May 2001, Santa Barbara, California, United States
[doi> 10.1145/375551.375561]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
P. Gillin. BI @ the Speed of Business. Computer World Technology Briefings. December 2007. Available at: http://resources.computerworld.com/sas_imw/registration.php?item=12&tab=1.
|
 |
13
|
Laura M. Haas , Mauricio A. Hernández , Howard Ho , Lucian Popa , Mary Roth, Clio grows up: from research prototype to industrial tool, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066252]
|
| |
14
|
|
| |
15
|
Richard Hull, Artifact-Centric Business Process Models: Brief Survey of Research Results and Challenges, Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems, November 09-17, 2008, Monterrey, Mexico
[doi> 10.1007/978-3-540-88873-4_17]
|
| |
16
|
Informatica. Pushdown Optimization. Available at: http://www.informatica.com/INFA_Resources/ds_pushdown_optimization_6675.pdf
|
| |
17
|
Informatica. How to Achieve Flexible, Cost-effective Scalability and Performance through Pushdown Processing. White paper, November 2007.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
H. A. Kuno, K. Yuasa, K. Govindarajan, K. Smathers, B. Burg, P. Carau, K. Wilkinson. Governing the Contract Lifecycle: A Framework for Sequential Configuration of Loosely-Coupled Systems. In DNIS, pp. 264--279, 2005.
|
| |
22
|
S. Luján-Mora, P. Vassiliadis, J. Trujillo. Data Mapping Diagrams for Data Warehouse Design with UML. In ER, pp. 191--204, 2004.
|
| |
23
|
|
| |
24
|
C. Thomsen, T. B. Pedersen, W. Lehner. RiTE: Providing On-Demand Data for Right-Time Data Warehousing. In ICDE, pp. 456--465, 2008.
|
| |
25
|
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N.-E. Frantzell. Supporting Streaming Updates in an Active Data Warehouse. In ICDE, pp. 476--485, 2007.
|
 |
26
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
| |
27
|
T. K. Sellis, A. Simitsis. ETL Workflows: From Formal Specification to Optimization. In ADBIS, pp. 1--11, 2007.
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
P. Vassiliadis, A. Simitsis. Near Real Time ETL. In Springer Annals of Information Systems, Vol. 3, pp. 19--29, 2008.
|
 |
34
|
Panos Vassiliadis , Alkis Simitsis , Spiros Skiadopoulos, Conceptual modeling for ETL processes, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, p.14-21, November 08-08, 2002, McLean, Virginia, USA
[doi> 10.1145/583890.583893]
|
| |
35
|
P. Vassiliadis, A. Simitsis, M. Terrovitis, S. Skiadopoulos. Blueprints and Measures for ETL Workflows. In ER, pp. 385--400, 2005.
|
| |
36
|
C. White. The Next Generation of Business Intelligence: Operational BI. DM Review Magazine, May 2005
|
| |
37
|
K. Wilkinson, H. A. Kuno, K. Govindarajan, K. Yuasa, K. Smathers, J. Nanda, U. Dayal. Enabling Outsourced Service Providers to Think Globally While Acting Locally. In EDBT, pp. 1106--1109, 2006.
|
| |
38
|
WS-BPEL Version 2.0, Oasis. Available at: http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf
|
CITED BY
|
|
Alkis Simitsis , Kevin Wilkinson , Malu Castellanos , Umeshwar Dayal, QoX-driven ETL design: reducing the cost of ETL consulting engagements, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|