ACM Home Page
Please provide us with feedback. Feedback
An iterative hypothesis-testing strategy for pattern discovery
Full text PdfPdf (456 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
SESSION: Research track table of contents
Pages: 49 - 58  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Richard J. Bolton  Imperial College London, South Kensington Campus, London, UK
Niall M. Adams  Imperial College London, South Kensington Campus, London, UK
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 56,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956760
What is a DOI?

ABSTRACT

Pattern discovery has emerged as a direct result of increased data storage and analytic capabilities available to the data analyst. Without a massive amount of data, we do not have the evidence to support the discovery of the local deterministic structures that we call patterns. As such, pattern discovery is one of the few areas of data mining that cannot be considered simply as a 'scaling-up' of current statistical methodology to analyze large data sets. However, the philosophies of hypothesis testing and modeling in traditional statistics do lend themselves to forming a framework for pattern discovery, and we can also draw from ideas relating to outlier discovery and residual analysis to discover patterns. We illustrate an iterative strategy in a statistical framework by way of its application to one simulated and two real data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Bellman, R. E. Adaptive control processes: a guided tour. Princeton, N.J.: Princeton University Press, 1961.
 
3
Bolton, R. J. and Krzanowski, W. J. (1999) A characterization of principal components for projection pursuit, American Statistician, 53:108--109.
 
4
 
5
 
6
Breiman, L. (2001) Statistical modeling: The two cultures, Statistical Science, 16:199--215.
7
 
8
Cook, R. D. and Weisberg, S. Residuals and Influence in Regression. New York: Chapman and Hall, 1982.
 
9
DuMouchel, W. (1999) Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (with discussion), The American Statistician, 53: 177--202.
10
 
11
Hand, D. J. (1998) Data mining - reaching beyond statistics, Research in Official Statistics, 2.
 
12
Hand, D. J., Blunt, G., Kelly, M. G., and Adams, N. M. (2000) Data mining for fun and profit, Statistical Science, 15: 111--126.
 
13
Hand, D. J. and Blunt, G. (2001) Prospecting for gems in credit card data, IMA Journal of Management Mathematics, 12: 173--200.
 
14
 
15
Hand, D. J. and Bolton, R. J. (2002) Pattern Discovery, Imperial College Technical Report.
 
16
Ingrassia, S. (1992) A comparison between the simulated annealing and the EM algorithms in normal mixture decompositions, Statistics and Computing, 2: 203--211.
 
17
 
18
 
19
Padmanabhan, B. and Tuzhilin, A. A belief-driven method for discovering unexpected patterns, in Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1998, pp. 94--110.
 
20
 
21
Pigeot, I. (2000) Basic concepts of multiple tests - a survey, Statistical Papers, 41: 3--36.
 
22
Redner, R. A. and Walker, H. F. (1984) Mixture densities, maximum likelihood and the EM algorithm, SIAM Review, 26: 195--239.
 
23
Schonlau, M., DuMouchel, W., Ju, W.-H., Karr, A. F., Theus, M., and Vardi, Y. (2001) Computer intrusion: detecting masquerades, Statistical Science, 16: 1--17.
 
24
 
25
Venables, W. N. and Ripley, B. D. Modern applied statistics with S-PLUS. New York: Springer-Verlag, 1999.


Collaborative Colleagues:
Richard J. Bolton: colleagues
Niall M. Adams: colleagues