ACM Home Page
Please provide us with feedback. Feedback
Finding simple intensity descriptions from event sequence data
Full text PdfPdf (530 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 341 - 346  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Heikki Mannila  Nokia Research Center, FIN-00045 Nokia Group, Finland
Marko Salmenkivi  University of Helsinki, Finland
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 47,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502562
What is a DOI?

ABSTRACT

Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in O(n2k) time for a sequence of n events and k different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Arjas. Survival models and martingale dynamics. Scandinavian Journal of Statistics, 16:177-225, 1989.
 
2
E. Arjas and J. Heikkinen. An Algorithm for nonparametric Bayesian estimation of a Poisson intensity. Computational Statistics, 12:385-402, 1997.
 
3
D. Hawkins. Point estimation of parameters of piecewise regression models. Journal of The Royal Statistical Society Series C, 25(1):51-57, 1976.
 
4
M. Eerola, H. Mannila, and M. Salmenkivi. Frailty factors and time-dependent hazards in modeling ear infections in children using Bassist. In Prec. of XIII Symposium on Computational Statistics, pages 287-292, Bristol, June 1998.
 
5
P. Green. Reversible jump Marker chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732, 1995.
 
6
P. Guttorp. Stochastic modeling of scientific data. Chapman and Hall, London, 1995.
 
7
M. Klemettinen, H. Mannila, and H. Toivonen. Interactice exploration of interesting findings in TASA. Information and Software Technology, Special Issue on Knowledge Discovery and Data Mining, 1999.
 
8
L. Tierney. Markov chains for exploring posterior distributions. Annals of Statistics, 22(4):1701-1728, 1994.
 
9
S. Chib and E. Greenberg. Understanding the Metropolis-Hastings algorithm. The American Statistician, 49:327-335, 1995.
10
 
11
D. W.Gilks, S.Richardson. Marker chain Monte Carlo in practice. Chapman and Hall, London, 1996.


Collaborative Colleagues:
Heikki Mannila: colleagues
Marko Salmenkivi: colleagues