ACM Home Page
Please provide us with feedback. Feedback
Data integration flows for business intelligence
Full text PdfPdf (1.41 MB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Invited papers table of contents
Pages 1-11  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Umeshwar Dayal  HP Labs, Palo Alto, Ca
Malu Castellanos  HP Labs, Palo Alto, Ca
Alkis Simitsis  HP Labs, Palo Alto, Ca
Kevin Wilkinson  HP Labs, Palo Alto, Ca
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 195,   Downloads (12 Months): 741,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516362
What is a DOI?

ABSTRACT

Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today's BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a oneway, batch process, usually implemented by extract-transform-load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Anthimopoulos, B. Gatos, I. Pratikakis. Multiresolution text detection in video frames. In VISAPP (2), pp. 161--166, 2007.
 
2
3
 
4
 
5
 
6
S. Chen, L. Bao, P. Chen. OptBPEL: A Tool for Performance Optimization of BPEL Process. In Software Composition, pp. 141--148, 2008.
 
7
L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. Non-Functional Requirements in Software Engineering. Kluwer Academic Publishing, 1999.
8
 
9
 
10
 
11
 
12
P. Gillin. BI @ the Speed of Business. Computer World Technology Briefings. December 2007. Available at: http://resources.computerworld.com/sas_imw/registration.php?item=12&tab=1.
13
 
14
 
15
 
16
Informatica. Pushdown Optimization. Available at: http://www.informatica.com/INFA_Resources/ds_pushdown_optimization_6675.pdf
 
17
Informatica. How to Achieve Flexible, Cost-effective Scalability and Performance through Pushdown Processing. White paper, November 2007.
 
18
 
19
 
20
 
21
H. A. Kuno, K. Yuasa, K. Govindarajan, K. Smathers, B. Burg, P. Carau, K. Wilkinson. Governing the Contract Lifecycle: A Framework for Sequential Configuration of Loosely-Coupled Systems. In DNIS, pp. 264--279, 2005.
 
22
S. Luján-Mora, P. Vassiliadis, J. Trujillo. Data Mapping Diagrams for Data Warehouse Design with UML. In ER, pp. 191--204, 2004.
 
23
 
24
C. Thomsen, T. B. Pedersen, W. Lehner. RiTE: Providing On-Demand Data for Right-Time Data Warehousing. In ICDE, pp. 456--465, 2008.
 
25
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N.-E. Frantzell. Supporting Streaming Updates in an Active Data Warehouse. In ICDE, pp. 476--485, 2007.
26
 
27
T. K. Sellis, A. Simitsis. ETL Workflows: From Formal Specification to Optimization. In ADBIS, pp. 1--11, 2007.
28
 
29
30
 
31
32
 
33
P. Vassiliadis, A. Simitsis. Near Real Time ETL. In Springer Annals of Information Systems, Vol. 3, pp. 19--29, 2008.
34
 
35
P. Vassiliadis, A. Simitsis, M. Terrovitis, S. Skiadopoulos. Blueprints and Measures for ETL Workflows. In ER, pp. 385--400, 2005.
 
36
C. White. The Next Generation of Business Intelligence: Operational BI. DM Review Magazine, May 2005
 
37
K. Wilkinson, H. A. Kuno, K. Govindarajan, K. Yuasa, K. Smathers, J. Nanda, U. Dayal. Enabling Outsourced Service Providers to Think Globally While Acting Locally. In EDBT, pp. 1106--1109, 2006.
 
38
WS-BPEL Version 2.0, Oasis. Available at: http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf


Collaborative Colleagues:
Umeshwar Dayal: colleagues
Malu Castellanos: colleagues
Alkis Simitsis: colleagues
Kevin Wilkinson: colleagues