ACM Home Page
Please provide us with feedback. Feedback
On the discovery of significant statistical quantitative rules
Full text PdfPdf (151 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 374 - 383  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Hong Zhang  University of Pennsylvania, Philadelphia, PA
Balaji Padmanabhan  University of Pennsylvania, Philadelphia, PA
Alexander Tuzhilin  New York University, New York, NY
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 76,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014094
What is a DOI?

ABSTRACT

In this paper we study market share rules, rules that have a certain market share statistic associated with them. Such rules are particularly relevant for decision making from a business perspective. Motivated by market share rules, in this paper we consider statistical quantitative rules (SQ rules) that are quantitative rules in which the RHS can be any statistic that is computed for the segment satisfying the LHS of the rule. Building on prior work, we present a statistical approach for learning all significant SQ rules, i.e., SQ rules for which a desired statistic lies outside a confidence interval computed for this rule. In particular we show how resampling techniques can be effectively used to learn significant rules. Since our method considers the significance of a large number of rules in parallel, it is susceptible to learning a certain number of "false" rules. To address this, we present a technique that can determine the number of significant SQ rules that can be expected by chance alone, and suggest that this number can be used to determine a "false discovery rate" for the learning procedure. We apply our methods to online consumer purchase data and report the results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
Benjamini, Y. and Hochberg, Y., Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of Royal Statistical Society B, vol. 57, iss. 1, pp. 289--300, 1995.
5
 
6
 
7
8
9
10
 
11
 
12
Clearwater, S. and Provost, F., Rl4: A Tool for Knowledge-Based Induction, in Procs. of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24--30, 1990.
 
13
Efron, B. and Tibshirani, R. J., An Introduction to the Bootstrap. New York, NY: Chapman & Hall, 1993.
 
14
15
 
16
Good, P., Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses - 2nd Edition. New York: Springer, 2000.
 
17
Hsu, J. C., Multiple Comparisons - Theory and Methods. London, UK: Chapman & Hall, 1996.
 
18
Jensen, D., Knowledge Discovery through Induction with Randomization Testing, in Proceedings of the 1991 Knowledge Discovery in Databases Workshop, pp. 148--159, Menlo Park, 1991.
 
19
 
20
Kohavi, R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137--1143, San Mateo, CA, 1995.
 
21
 
22
Ling, C. X. and Li, C., Data Mining for Direct Marketing: Problems and Solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 73--79, New York, NY, 1998.
23
24
 
25
Megiddo, N. and Srikant, R., Discovering Predictive Association Rules, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 274--278, New York, NY, 1998.
 
26
Oates, T. and Jensen, D., Large Datasets Lead to Overly Complex Models: An Explanation and a Solution, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 294--298, Menlo Park, CA, 1998.
 
27
Padmanabhan, B. and Tuzhilin, A., A Belief-Driven Method for Discovering Unexpected Patterns, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 94--100, New York, NY, 1998.
28
 
29
 
30
31
32
33
 
34
Westfall, P. H. and Young, S. S., Resampling-Based Multiple Testing - Examples and Methods for P-Value Adjustment. New York, NY: John Wiley & Sons, Inc, 1993.
 
35
 
36
Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Bayesian Network Anomaly Pattern Detection for Disease Outbreaks, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, 2003.

CITED BY  13

Collaborative Colleagues:
Hong Zhang: colleagues
Balaji Padmanabhan: colleagues
Alexander Tuzhilin: colleagues