|
ABSTRACT
This paper deals with detecting change of distribution in multi-dimensional data sets. For a given baseline data set and a set of newly observed data points, we define a statistical test called the density test for deciding if the observed data points are sampled from the underlying distribution that produced the baseline data set. We define a test statistic that is strictly distribution-free under the null hypothesis. Our experimental results show that the density test has substantially more power than the two existing methods for multi-dimensional change detection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Deepak Agarwal , Andrew McGregor , Jeff M. Phillips , Suresh Venkatasubramanian , Zhengyuan Zhu, Spatial scan statistics: approximations and performance study, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150410]
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.
|
 |
6
|
Markus M. Breunig , Hans-Peter Kriegel , Raymond T. Ng , Jörg Sander, LOF: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.93-104, May 15-18, 2000, Dallas, Texas, United States
|
| |
7
|
|
| |
8
|
T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-theoretic approach to detecting changes in multi-dimensional data streams. In Interface, 2006.
|
| |
9
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JRSS Series B), 39(1):1--38, 1977.
|
| |
10
|
B. Efron and R. J. Tibshirani. An introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, 1993.
|
| |
11
|
|
| |
12
|
M. Kulldorff. A spatial scan statistic. Comm. in Statistics: Theory and Methods, 26(6):1481--1496, 1997.
|
| |
13
|
J.-F. Maa, D. Pearl, and R. Bartoszynski. Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. The Annals of Statistics, 24(3): 1069--1074, 1996.
|
| |
14
|
R. Miller. Simultaneous Statistical Inference. McGraw-Hill, New York, 1966.
|
 |
15
|
|
| |
16
|
P. R. Rosenbaum. An exact distribution-free test comparing two multivariate distributions based on adjacency. JRSS Series B), 67(4): 515--530, 2005.
|
| |
17
|
D. Scott. Multivariate Density Estimation: Theory, Practice and Visualization. Wiley-Interscience, New York, 1992.
|
| |
18
|
S. Sheather and M. Jones. A reliable databased bandwidth selection method for kernel density estimation. JRSS Series B, (53):683--690, 1991.
|
| |
19
|
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London, 1986.
|
| |
20
|
M. Wand and M. Jones. Kernel Smoothing. Chapman and Hall, 1995.
|
| |
21
|
W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. Bayesian network anomaly pattern detection for disease outbreaks. In ICML, pages 808--815, 2003.
|
|