|
ABSTRACT
Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
A. Agresti. Categorical Data Analysis. John Wiley & Sons, 1990.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of the Second Int'l Conference on Knowledge Discovery and Data Mining, pages 263--266, Portland, Oregon, 1996.
|
 |
9
|
Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo, Finding interesting rules from large sets of discovered association rules, Proceedings of the third international conference on Information and knowledge management, p.401-407, November 29-December 02, 1994, Gaithersburg, Maryland, United States
[doi> 10.1145/191246.191314]
|
| |
10
|
I. Kononenko. On biases in estimating multi-valued attributes. In Proc. of the Fourteenth Int'l Joint Conf. on Artificial Intelligence (IJCAI'95), pages 1034--1040, Montreal, Canada, 1995.
|
 |
11
|
Bing Liu , Wynne Hsu , Yiming Ma, Pruning and summarizing the discovered associations, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.125-134, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312216]
|
| |
12
|
F. Mosteller. Association and estimation in contingency tables. Journal of the American Statistical Association, 63:1--28, 1968.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical Report 2002--112, Army High Performance Computing Research Center, 2002.
|
CITED BY 60
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haiquan Li , Jinyan Li , Limsoon Wong , Mengling Feng , Yap-Peng Tan, Relative risk and odds ratio: a data mining perspective, Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 13-15, 2005, Baltimore, Maryland
|
|
|
|
|
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Heikki Mannila , Taneli Mielikäinen , Panayiotis Tsaparas, Assessing data mining results via swap randomization, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Hassan H. Malik , John R. Kender, Clustering web images using association rules, interestingness measures, and hypergraph partitions, Proceedings of the 6th international conference on Web engineering, July 11-14, 2006, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Dong Xin , Hong Cheng , Xifeng Yan , Jiawei Han, Extracting redundancy-aware top-k patterns, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Kaidi Zhao , Bing Liu , Jeffrey Benkler , Weimin Xiao, Opportunity map: identifying causes of failure - a deployed data mining system, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Adriano Veloso , Wagner Meira, Jr. , Marco Cristo , Marcos Gonçalves , Mohammed Zaki, Multi-evidence, multi-criteria, lazy associative document classification, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Miho Ohsaki , Hidenao Abe , Shusaku Tsumoto , Hideto Yokoi , Takahira Yamaguchi, Evaluation of rule interestingness measures in medical knowledge discovery in databases, Arificial Intelligence in Medicine, v.41 n.3, p.177-196, November, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xian Zhang , Yu Hao , Xiaoyan Zhu , Ming Li , David R. Cheriton, Information distance from a question to an answer, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chong Long , Xiaoyan Zhu , Ming Li , Bin Ma, Information shared by many objects, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Gláucia M. Bressan , Vilma A. Oliveira , Estevam R. Hruschka, Jr. , Maria C. Nicoletti, Using Bayesian networks with rule extraction to infer the risk of weed infestation in a corn-crop, Engineering Applications of Artificial Intelligence, v.22 n.4-5, p.579-592, June, 2009
|
|
|
|
|
|
|
|
|
|
REVIEW
"Susan Bridges : Reviewer"
Tan, Kumar, and Srivastava describe a theoretical and experimental investigation of measures for association patterns. The authors survey a large number of such measures that have been developed by the statistics, machine learning, and data mining
more...
|