|
ABSTRACT
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Agrawal, A. Arning, T. BoUinger, M. Mehta, J. Shafer, and R. Srikant. The Quest Data Mining System. In Proc. of the ~nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, August 1996.
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
4
|
|
| |
5
|
R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. o/the ~nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, August 1996.
|
 |
6
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
J. Han, Y. Fu, K. Koperski, W. Wang, and O. Zaiane. DMQL: A data mining query language for relational datbases. In Proc. of the 1996 SIGMOD workshop on research issues on data mining and knowledge discovery, Montreal, Canada, May 1996.
|
| |
10
|
|
| |
11
|
IBM Corporation. DB~ Universal Database Application programming guide Version 5, 1997.
|
 |
12
|
|
| |
13
|
T. Imielinski, A. Virmani, and A. Abdulghani. Discovery Board Application Programming Interface and Query Language for Database Mining. In Proc. of the ~nd Int'l Conference on Knowledge Discovery and Data Mining, Portland, Oregon, August 1996.
|
| |
14
|
internationl Business Machines. IBM Intelligent Miner User's Guide, Version 1 Release 1, SH12-6213-00 edition, July 1996.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Oracle. Oracle RDBMS Database Administrator's Guide Volumes I, II (Version 7.0), May 1992.
|
| |
19
|
H. Pirahesh and B. Reinwald. SQL table function open architecture and data access middleware. In SIGMOD, 1998.
|
| |
20
|
K. Rajamani, B. Iyer, and A. Chaddha. Using DB/2's object relational extensions for mining associations rules. Technical Report TR 03,690., Santa Teresa Laboratory, IBM Corporation, sept 1997.
|
| |
21
|
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. Research Report RJ 10107 (91923), IBM Almaden Research Center, San jose, CA 95120, March 1998. Available from http : //~. almaden, ibm. corn/us/quest.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
Dick Tsur , Jeffrey D. Ullman , Serge Abiteboul , Chris Clifton , Rajeev Motwani , Svetlozar Nestorov , Arnon Rosenthal, Query flocks: a generalization of association-rule mining, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.1-12, June 01-04, 1998, Seattle, Washington, United States
|
| |
26
|
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules. In Proc. of the 3rd Int'l Conference on Knowledge Discovery and Data Mining, Newport Beach, California, August 1997.
|
CITED BY 69
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Claus Boyens , Oliver Günther , Maximilian Teltzrow, Privacy conflicts in CRM services for online shops: a case study, Proceedings of the IEEE international conference on Privacy, security and data mining, p.27-35, December 01, 2002, Maebashi City, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yubao Liu , Jiarong Cai , Zhilan Huang , Jingwen Yu , Jian Yin, Fast detection of database system abuse behaviors based on data mining approach, Proceedings of the 2nd international conference on Scalable information systems, June 06-08, 2007, Suzhou, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Magdalena Balazinska , Amol Deshpande , Michael J. Franklin , Phillip B. Gibbons , Jim Gray , Mark Hansen , Michael Liebhold , Suman Nath , Alexander Szalay , Vincent Tao, Data Management in the Worldwide Sensor Web, IEEE Pervasive Computing, v.6 n.2, p.30-40, April 2007
|
|
|
Faris Alqadah , Zhen Hu , Lawrence J. Mazlack, Vertical mining with incomplete data, Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems, p.380-385, October 26-28, 2008, Corfu, Greece
|
|