|
ABSTRACT
In this paper we study market share rules, rules that have a certain market share statistic associated with them. Such rules are particularly relevant for decision making from a business perspective. Motivated by market share rules, in this paper we consider statistical quantitative rules (SQ rules) that are quantitative rules in which the RHS can be any statistic that is computed for the segment satisfying the LHS of the rule. Building on prior work, we present a statistical approach for learning all significant SQ rules, i.e., SQ rules for which a desired statistic lies outside a confidence interval computed for this rule. In particular we show how resampling techniques can be effectively used to learn significant rules. Since our method considers the significance of a large number of rules in parallel, it is susceptible to learning a certain number of "false" rules. To address this, we present a technique that can determine the number of significant SQ rules that can be expected by chance alone, and suggest that this number can be used to determine a "false discovery rate" for the learning procedure. We apply our methods to online consumer purchase data and report the results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
Benjamini, Y. and Hochberg, Y., Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of Royal Statistical Society B, vol. 57, iss. 1, pp. 289--300, 1995.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Tom Brijs , Gilbert Swinnen , Koen Vanhoof , Geert Wets, Using association rules for product assortment decisions: a case study, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.254-260, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312241]
|
 |
9
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
 |
10
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
11
|
|
| |
12
|
Clearwater, S. and Provost, F., Rl4: A Tool for Knowledge-Based Induction, in Procs. of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24--30, 1990.
|
| |
13
|
Efron, B. and Tibshirani, R. J., An Introduction to the Bootstrap. New York, NY: Chapman & Hall, 1993.
|
| |
14
|
|
 |
15
|
Takeshi Fukuda , Yasukiko Morimoto , Shinichi Morishita , Takeshi Tokuyama, Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.13-23, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
16
|
Good, P., Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses - 2nd Edition. New York: Springer, 2000.
|
| |
17
|
Hsu, J. C., Multiple Comparisons - Theory and Methods. London, UK: Chapman & Hall, 1996.
|
| |
18
|
Jensen, D., Knowledge Discovery through Induction with Randomization Testing, in Proceedings of the 1991 Knowledge Discovery in Databases Workshop, pp. 148--159, Menlo Park, 1991.
|
| |
19
|
|
| |
20
|
Kohavi, R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137--1143, San Mateo, CA, 1995.
|
| |
21
|
|
| |
22
|
Ling, C. X. and Li, C., Data Mining for Direct Marketing: Problems and Solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 73--79, New York, NY, 1998.
|
 |
23
|
|
 |
24
|
D. R. Mani , James Drew , Andrew Betz , Piew Datta, Statistics and data mining techniques for lifetime value modeling, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.94-103, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312205]
|
| |
25
|
Megiddo, N. and Srikant, R., Discovering Predictive Association Rules, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 274--278, New York, NY, 1998.
|
| |
26
|
Oates, T. and Jensen, D., Large Datasets Lead to Overly Complex Models: An Explanation and a Solution, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 294--298, Menlo Park, CA, 1998.
|
| |
27
|
Padmanabhan, B. and Tuzhilin, A., A Belief-Driven Method for Discovering Unexpected Patterns, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 94--100, New York, NY, 1998.
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
 |
33
|
|
| |
34
|
Westfall, P. H. and Young, S. S., Resampling-Based Multiple Testing - Examples and Methods for P-Value Adjustment. New York, NY: John Wiley & Sons, Inc, 1993.
|
| |
35
|
Weng-Keen Wong , Andrew Moore , Gregory Cooper , Michael Wagner, Rule-based anomaly pattern detection for detecting disease outbreaks, Eighteenth national conference on Artificial intelligence, p.217-223, July 28-August 01, 2002, Edmonton, Alberta, Canada
|
| |
36
|
Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Bayesian Network Anomaly Pattern Detection for Disease Outbreaks, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, 2003.
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Navin Kumar , Aryya Gangopadhyay , Sanjay Bapna , George Karabatis , Zhiyuan Chen, Measuring interestingness of discovered skewed patterns in data cubes, Decision Support Systems, v.46 n.1, p.429-439, December, 2008
|
|
|
|
|
|
Adam Kirsch , Michael Mitzenmacher , Andrea Pietracaprina , Geppino Pucci , Eli Upfal , Fabio Vandin, An efficient rigorous approach for identifying statistically significant frequent itemsets, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|
|
|
|