ACM Home Page
Please provide us with feedback. Feedback
Selecting the right interestingness measure for association patterns
Full text PdfPdf (937 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Frequent patterns I table of contents
Pages: 32 - 41  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Pang-Ning Tan  University of Minnesota, Minneapolis, MN
Vipin Kumar  University of Minnesota, Minneapolis, MN
Jaideep Srivastava  University of Minnesota, Minneapolis, MN
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 238,   Citation Count: 60
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775053
What is a DOI?

ABSTRACT

Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
A. Agresti. Categorical Data Analysis. John Wiley & Sons, 1990.
 
4
 
5
 
6
 
7
 
8
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of the Second Int'l Conference on Knowledge Discovery and Data Mining, pages 263--266, Portland, Oregon, 1996.
9
 
10
I. Kononenko. On biases in estimating multi-valued attributes. In Proc. of the Fourteenth Int'l Joint Conf. on Artificial Intelligence (IJCAI'95), pages 1034--1040, Montreal, Canada, 1995.
11
 
12
F. Mosteller. Association and estimation in contingency tables. Journal of the American Statistical Association, 63:1--28, 1968.
 
13
 
14
 
15
 
16
P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical Report 2002--112, Army High Performance Computing Research Center, 2002.

CITED BY  60


REVIEW

"Susan Bridges : Reviewer"

Tan, Kumar, and Srivastava describe a theoretical and experimental investigation of measures for association patterns. The authors survey a large number of such measures that have been developed by the statistics, machine learning, and data mining  more...

Collaborative Colleagues:
Pang-Ning Tan: colleagues
Vipin Kumar: colleagues
Jaideep Srivastava: colleagues