|
ABSTRACT
Given an N x N grid of squares, where each square has a count cij and an underlying population pij, our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N4) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in O((N log N)2) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
2
|
K. Deng and A. W. Moore. Multiresolution instance-based learning. In Proc. 12th Intl. Joint Conf. on Artificial Intelligence, pages 1233--1239, 1995.
|
| |
3
|
S. Goil, H. Nagesh, and A. Choudhary. MAFIA: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 1999.
|
| |
4
|
M. Kulldorff. A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6):1481--1496, 1997.
|
| |
5
|
M. Kulldorff. Spatial scan statistics: models, calculations, and applications. In J. Glaz and N. Balakrishnan, editors, Scan Statistics and Applications, pages 303--322. Birkhauser, 1999.
|
| |
6
|
M. Kulldorff and N. Nagarwalla. Spatial disease clusters: detection and inference. Statistics in Medicine, 14:799--810, 1995.
|
 |
7
|
|
| |
8
|
S. Openshaw, M. Charlton, A. Craft, and J. Birch. Investigation of leukemia clusters by use of a geographical analysis machine. Lancet, 1:272--3, 1988.
|
| |
9
|
|
| |
10
|
|
| |
11
|
L. Waller, B. Turnbull, L. Clark, and P. Nasca. Spatial analysis to detect disease clusters. In N. Lange, editor, Case Studies in Biometry, pages 3--23. Wiley, 1994.
|
| |
12
|
|
CITED BY 11
|
|
|
|
|
Daniel B. Neill , Andrew W. Moore , Maheshkumar Sabhnani , Kenny Daniel, Detection of emerging space-time clusters, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
Deepak Agarwal , Andrew McGregor , Jeff M. Phillips , Suresh Venkatasubramanian , Zhengyuan Zhu, Spatial scan statistics: approximations and performance study, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Xiuyao Song , Mingxi Wu , Christopher Jermaine , Sanjay Ranka, Statistical change detection for multi-dimensional data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Artur Dubrawski , Kimberly Elenberg , Andrew Moore , Maheshkumar Sabhnani, Monitoring food safety by detecting patterns in consumer complaints, Proceedings of the 18th conference on Innovative applications of artificial intelligence, p.1782-1788, July 16-20, 2006, Boston, Massachusetts
|
|
|
Mingxi Wu , Xiuyao Song , Chris Jermaine , Sanjay Ranka , John Gums, A LRT framework for fast spatial anomaly detection, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
Mingxi Wu , Xiuyao Song , Chris Jermaine , Sanjay Ranka , John Gums, A LRT framework for fast spatial anomaly detection, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|