|
ABSTRACT
Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
2 http://www.olapcouncil.org
|
| |
3
|
3 Codd, E. F., S. B. Codd, C. T. Salley, "Providing OLAP (On-Line Analytical Processing) to User Analyst: An IT Mandate." Available from Arbor Software's web site http://www.arborsoft.com/OLAP.html.
|
| |
4
|
4 http://pwp.starnetinc.com/larryg/articles.html
|
| |
5
|
5 Kimball, R. The Data Warehouse Toolkit. John Wiley, 1996.
|
 |
6
|
|
 |
7
|
|
| |
8
|
8 Gupta, A., I. S. Mumick, "Maintenance of Materialized Views: Problems, Techniques, and Applications." Data Eng. Bulletin, Vol. 18, No. 2, June 1995.
|
 |
9
|
Yue Zhuge , Héctor García-Molina , Joachim Hammer , Jennifer Widom, View maintenance in a warehousing environment, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.316-327, May 22-25, 1995, San Jose, California, United States
|
| |
10
|
10 Roussopoulos, N., et al., "The Maryland ADMS Project: Views R Us." Data Eng. Bulletin, Vol. 18, No. 2, June 1995.
|
 |
11
|
|
 |
12
|
|
 |
13
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
14
|
|
 |
15
|
Alon Y. Levy , Alberto O. Mendelzon , Yehoshua Sagiv, Answering queries using views (extended abstract), Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.95-104, May 22-25, 1995, San Jose, California, United States
[doi> 10.1145/212433.220198]
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
27 Chaudhuri S., Shim K. "An Overview of Cost-based Optimization of Queries with Aggregates" IEEE Data Enginering Bulletin, Sep. 1995.
|
 |
28
|
|
| |
29
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
30
|
30 Agrawal S. et.al. "On the Computation of Multidimensional Aggregates" Proc. of VLDB Conf., 1996.
|
 |
31
|
|
| |
32
|
|
 |
33
|
|
| |
34
|
34 Wu, M-C., A. P. Buchmann. "Research Issues in Data Warehousing." Submitted for publication.
|
CITED BY 259
|
|
|
|
|
Satyadeep Patnaik , Marshall Meier , Brian Henderson , Joe Hickman , Brajendra Panda, Improving the performance of lineage tracing in data warehouse, Proceedings of the 1999 ACM symposium on Applied computing, p.210-215, February 28-March 02, 1999, San Antonio, Texas, United States
|
|
|
|
|
|
|
|
|
Zina Ben-Miled , Yang Liu , Michael Bem , Robert Jones , Robert Oppelt , Samuel Milosevich , Dave Powers , Omran Bukhres, Data access performance in a large and dynamic pharmaceutical drug candidate database, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.22-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
David W. Cheung , Bo Zhou , Ben Kao , Hongjun Lu , Tak Wah Lam , Hing Fung Ting, Requirement-based data cube schema design, Proceedings of the eighth international conference on Information and knowledge management, p.162-169, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
Lingli Ding , Xin Zhang , Elke A. Rundensteiner, The MRE wrapper approach: enabling incremental view maintenance of data warehouses defined on multi-relation information sources, Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, p.30-35, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
Marcelo Arenas , Leopoldo Bertossi , Jan Chomicki, Consistent query answers in inconsistent databases, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.68-79, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
Juan Trujillo , Manuel Palomar , Jaime Gómez, Detecting patterns and OLAP operations in the GOLD model, Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, p.48-53, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Franck Ravat , Olivier Teste , Giles Zurfluh, Towards data warehouse design, Proceedings of the eighth international conference on Information and knowledge management, p.359-366, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sriram Padmanabhan , Bishwaranjan Bhattacharjee , Tim Malkemus , Leslie Cranston , Matthew Huras, Multi-dimensional clustering: a new data layout scheme in DB2, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
Rónán Páircéir , Sally McClean , Bryan Scotney, Discovery of multi-level rules and exceptions from a distributed database, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.523-532, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
Kurt Stockinger , Kesheng Wu , Arie Shoshani, Strategies for processing ad hoc queries on large data warehouses, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, p.72-79, November 08-08, 2002, McLean, Virginia, USA
|
|
|
Eugene Inseok Chong , Jagannathan Srinivasan , Souripriya Das , Chuck Freiwald , Aravind Yalamanchi , Mahesh Jagannath , Anh-Tuan Tran , Ramkumar Krishnan , Richard Jiang, A mapping mechanism to support bitmap index and other auxiliary structures on tables stored as primary B+-trees, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Antonio Badia , Matt Chanda , Bin Cao, Adding subqueries to MySQL, what does it take to have a decision-support engine?, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, p.49-56, November 08-08, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eugene Inseok Chong , Jagannathan Srinivasan , Souripriya Das , Chuck Freiwald , Aravind Yalamanchi , Mahesh Jagannath , Anh-Tuan Tran , Ramkumar Krishnan , Richard Jiang, A mapping mechanism to support bitmap index and other auxiliary structures on tables stored as primary B+-trees, ACM SIGMOD Record, v.32 n.2, June 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiawei Han , Jenny Y. Chiang , Sonny Chee , Jianping Chen , Qing Chen , Shan Cheng , Wan Gong , Micheline Kamber , Krzysztof Koperski , Gang Liu , Yijun Lu , Nebojsa Stefanovic , Lara Winstone , Betty B. Xia , Osmar R. Zaiane , Shuhua Zhang , Hua Zhu, DBMiner: a system for data mining in relational databases and data warehouses, Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research, p.8, November 10-13, 1997, Toronto, Ontario, Canada
|
|
|
Osmar R. Zaïane , Jiawei Han , Ze-Nian Li , Jean Hou, Mining multimedia data, Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p.24, November 30-December 03, 1998, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yannis Sismanis , Antonios Deligiannakis , Yannis Kotidis , Nick Roussopoulos, Hierarchical dwarfs for the rollup cube, Proceedings of the 6th ACM international workshop on Data warehousing and OLAP, November 07-07, 2003, New Orleans, Louisiana, USA
|
|
|
Zhiyuan Chen , Chen Li , Jian Pei , Yufei Tao , Haixun Wang , Wei Wang , Jiong Yang , Jun Yang , Donghui Zhang, Recent progress on selected topics in database research: a report by nine young Chinese researchers working in the United States, Journal of Computer Science and Technology, v.18 n.5, p.538-552, September 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chung-Min Chen , Munir Cochinwala , Elsa Yueh, Dealing with slow-evolving fact: a case study on inventory data warehousing, Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, p.22-29, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yeow Wei Choong , Dominique Laurent , Patrick Marcel, Computing appropriate representations for multidimensional data, Proceedings of the 4th ACM international workshop on Data warehousing and OLAP, p.16-23, November 09-09, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guozhu Dong , Jiawei Han , Joyce M. W. Lam , Jian Pei , Ke Wang , Wei Zou, Mining Constrained Gradients in Large Databases, IEEE Transactions on Knowledge and Data Engineering, v.16 n.8, p.922-938, August 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
L. Schlesinger , A. Bauer , W. Lehner , G. Ediberidze , M. Gutzmann, Efficiently synchronizing multidimensional schema data, Proceedings of the 4th ACM international workshop on Data warehousing and OLAP, p.69-76, November 09-09, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ladjel Bellatreche , Arnaud Giacometti , Patrick Marcel , Hassina Mouloudi , Dominique Laurent, A personalization framework for OLAP queries, Proceedings of the 8th ACM international workshop on Data warehousing and OLAP, November 04-05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bartosz Bȩbel , Johann Eder , Christian Koncilia , Tadeusz Morzy , Robert Wrembel, Creation and management of versions in multiversion data warehouse, Proceedings of the 2004 ACM symposium on Applied computing, March 14-17, 2004, Nicosia, Cyprus
|
|
|
|
|
|
|
|
|
Jiawei Han , Yixin Chen , Guozhu Dong , Jian Pei , Benjamin W. Wah , Jianyong Wang , Y. Dora Cai, Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams, Distributed and Parallel Databases, v.18 n.2, p.173-197, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deepak Agarwal , Dhiman Barman , Dimitrios Gunopulos , Neal E. Young , Flip Korn , Divesh Srivastava, Efficient and effective explanation of change in hierarchical summaries, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
António Abelha , José Machado , Victor Alves , José Neves, Health data management in the medical arena, Proceedings of the 4th WSEAS International Conference on Applied Informatics and Communications, p.1-6, December 17-19, 2004, Tenerife, Canary Islands, Spain
|
|
|
Roxana Geambasu , Tanya Bragin , Jaeyeon Jung , Magdalena Balazinska, On-demand view materialization and indexing for network forensic analysis, Proceedings of the 3rd USENIX international workshop on Networking meets databases, p.1-7, April 10, 2007, Cambridge, MA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher R. Lumb , Jiri Schindler , Gregory R. Ganger , David F. Nagle , Erik Riedel, Towards higher disk head utilization: extracting free bandwidth from busy disk drives, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.7-7, October 22-25, 2000, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
Brenton Louie , Peter Mork , Fernando Martin-Sanchez , Alon Halevy , Peter Tarczy-Hornoch, Methodological Review: Data integration and genomic medicine, Journal of Biomedical Informatics, v.40 n.1, p.5-16, February, 2007
|
|
|
Cristina Dutra de Aguiar Ciferri , Ricardo Rodrigues Ciferri , Diogo Tuler Forlani , Agma Juci Machado Traina , Fernando da Fonseca de Souza, Horizontal fragmentation as a technique to improve the performance of drill-down and roll-up queries, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
Bin He , Rui Wang , Ying Chen , Ana Lelescu , James Rhodes, BIwTL: a business information warehouse toolkit and language for warehousing simplification and automation, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
Sudarshan S. Chawathe , Venkat Krishnamurthy , Sridhar Ramachandran , Sanjay Sarma, Managing RFID data, Proceedings of the Thirtieth international conference on Very large data bases, p.1189-1195, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yixin Chen , Guozhu Dong , Jiawei Han , Benjamin W. Wah , Jianyong Wang, Multi-dimensional regression analysis of time-series data streams, Proceedings of the 28th international conference on Very Large Data Bases, p.323-334, August 20-23, 2002, Hong Kong, China
|
|
|
Nikos Karayannidis , Aris Tsois , Timos Sellis , Roland Pieringer , Volker Markl , Frank Ramsak , Robert Fenk , Klaus Elhardt , Rudolf Bayer, Processing star queries on hierarchically-clustered fact tables, Proceedings of the 28th international conference on Very Large Data Bases, p.730-741, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
Young-Koo Lee , Kyu-Young Whang , Yang-Sae Moon , Il-Yeol Song, A one-pass aggregation algorithm with the optimal buffer size in multidimensional OLAP, Proceedings of the 28th international conference on Very Large Data Bases, p.790-801, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
E. M. Kerkri , C. Quantin , F. A. Allaert , Y. Cottin , Ph. Charve , F. Jouanot , K. Yétongnon, An Approach for Integrating Heterogeneous Information Sources in a Medical Data Warehouse, Journal of Medical Systems, v.25 n.3, p.167-176, June 2001
|
|
|
|
|
|
|
|
|
Helena Galhardas , Daniela Florescu , Dennis Shasha , Eric Simon , Cristian-Augustin Saita, Declarative Data Cleaning: Language, Model, and Algorithms, Proceedings of the 27th International Conference on Very Large Data Bases, p.371-380, September 11-14, 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yixin Chen , Guozhu Dong , Jiawei Han , Jian Pei , Benjamin W. Wah , Jianyong Wang, Regression Cubes with Lossless Compression and Aggregation, IEEE Transactions on Knowledge and Data Engineering, v.18 n.12, p.1585-1599, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Chambliss , P. Pandey , T. Thakur , A. Fleshler , T. Clark , J. A. Ruddy , K. D. Gougherty , M. Kalos , L. Merithew , J. G. Thompson , H. M. Yudenfriend, An architecture for storage-hosted application extensions, IBM Journal of Research and Development, v.52 n.4, p.427-437, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Oliver Rübel , Prabhat , Kesheng Wu , Hank Childs , Jeremy Meredith , Cameron G. R. Geddes , Estelle Cormier-Michel , Sean Ahern , Gunther H. Weber , Peter Messmer , Hans Hagen , Bernd Hamann , E. Wes Bethel, High performance multivariate visual data exploration for extremely large data, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mohammad Rifaie , Erwin J. Blas , Abdel Rahman M. Muhsen , Terrance T. H. Mok , Keivan Kianmehr , Reda Alhajj , Mick J. Ridley, Data warehouse architecture for GIS applications, Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, November 24-26, 2008, Linz, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Masayuki Kessoku , Kazuhiko Tsuda , El-Sayed Atlam , Kazuhiro Morita , Masao Fuketa , Jun-ichi Aoe, A method to implement effective My-page service system using three-dimensional vectors, International Journal of Computer Applications in Technology, v.35 n.2/3/4, p.262-270, June 2009
|
|
|
|
|
|
|
|
|
|
|
|
Pablo Sendín-Raña , Francisco J. González-Castaño , Enrique Pérez-Barros , Pedro S. Rodríguez-Hernández , Felipe Gil-Castiñeira , José M. Pousada-Carballo, Improving the performance and functionality of Mondrian open-source OLAP systems, Software—Practice & Experience, v.39 n.3, p.279-298, March 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|