| Turbo-charging vertical mining of large databases |
| Full text |
Pdf
(323 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data
table of contents
Dallas, Texas, United States
Pages: 22 - 33
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
|
|
Authors
|
|
Pradeep Shenoy
|
Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ, Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
|
|
Jayant R. Haritsa
|
Database Systems Lab, SERC, Indian Institue of science, Bangalore 560012,INDIA, Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ
|
|
S. Sudarshan
|
Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
|
|
Gaurav Bhalotia
|
Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
|
|
Mayank Bawa
|
Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
|
|
Devavrat Shah
|
Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 66, Citation Count: 38
|
|
|
ABSTRACT
In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
|
| |
4
|
G. Gardarin, P. Pucheral, and F. Wu. Bitmap based algorithms for mining association rules. Technical report 1998-18, University of Versailles, 1998. (http://www.prism.uvsq.fr/rapports/1998/ document_1998_18.ps.gz)
|
| |
5
|
S.W. Golomb. Run-length encoding. IEEE Trans. on Information Theory, 12(3), 3uly 1966.
|
| |
6
|
M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.
|
| |
7
|
|
| |
8
|
P. Shenoy, 3. Haritsa, S. Sudarshan, M. Bawa, G. Bhalotia, and D. Shah. Turbo-charging vertical mining of large databases. Technical Report TR-2000-02, DSL, Indian Institute of Science, 2000. (http://dsl.serc.iisc.ernet.in/pub/TR/TR-2000-02.ps)
|
| |
9
|
|
| |
10
|
|
| |
11
|
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3rd Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1997.
|
CITED BY 38
|
|
Feng Pan , Gao Cong , Anthony K. H. Tung , Jiong Yang , Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Imad Rahal , Dongmei Ren , Amal Perera , Hassan Najadat , William Perrizo , Riad Rahhal , Willy Valdivia, Incremental interactive mining of constrained association rules from biological annotation data with nominal features, Proceedings of the 2005 ACM symposium on Applied computing, March 13-17, 2005, Santa Fe, New Mexico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guangzhu Yu , Keqing Li , Shihuang Shao, Mining high utility itemsets in large high dimensional data, Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop, January 21-23, 2008, Adelaide, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Faris Alqadah , Zhen Hu , Lawrence J. Mazlack, Vertical mining with incomplete data, Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems, p.380-385, October 26-28, 2008, Corfu, Greece
|
|
|
|
|
|
|
|