|
ABSTRACT
Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
S. L. Scott and P. Smyth, "The Markov modulated Poisson process and Markov Poisson cascade with applications to web traffic data," Bayesian Statistics, vol. 7, pp. 671--680, 2003.
|
| |
6
|
S. Scott, "Detecting network intrusion using a Markov modulated nonhomogeneous Poisson process," http://www-rcf.usc.edu/~sls/mmnhpp.ps.gz.
|
| |
7
|
Freeway Performance Measurement System (PeMS), "http://pems.eecs.berkeley.edu/."
|
| |
8
|
S. Scott, "Bayesian methods and extensions for the two state Markov modulated Poisson process," Ph.D. dissertation, Harvard University, Dept. of Statistics, 1998.
|
| |
9
|
H. Heffes and D. M. Lucantoni, "A Markov-modulated characterization of packetized voice and data traffic and related statistical multiplexer performance," IEEE J. Sel. Areas Comm., vol. 4, no. 6, pp. 856--868, 1984.
|
| |
10
|
S. Geman and D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," IEEE Trans. PAMI, vol. 6, no. 6, pp. 721--741, Nov. 1984.
|
| |
11
|
A. E. Gelfand and A. F. M. Smith, "Sampling-based approaches to calculating marginal densities," J. Amer. Stat. Assoc., vol. 85, pp. 398--409, 1990.
|
| |
12
|
L. E. Baum, T. Petrie, G. Soules, and N. Weiss, "A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains," Ann. Math. Stat., vol. 41, no. 1, pp. 164--171, February 1970.
|
| |
13
|
A. E. Gelfand and D. K. Dey, "Bayesian model choice: asymptotics and exact calculations," J. R. Stat. Soc. B, vol. 56, no. 3, pp. 501--514, 1990.
|
| |
14
|
S. Chib, "Marginal likelihood from the Gibbs output," J. Amer. Stat. Assoc., vol. 90, no. 432, pp. 1313--1321, Dec. 1995.
|
CITED BY 8
|
|
|
|
|
|
|
|
Xiuyao Song , Chris Jermaine , Sanjay Ranka , John Gums, A bayesian mixture model with linear regression mixing proportions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Xiuyao Song , Chris Jermaine , Sanjay Ranka , John Gums, A bayesian mixture model with linear regression mixing proportions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|