|
ABSTRACT
In this paper, we presented a frequent pattern based framework for event detection in stream data, it consists of frequent pattern discovery, frequent pattern selection and modeling three phases: In the first phase, a MNOE (Mining Non-Overlapping Episode) algorithm is proposed to find the non-overlapping frequent pattern in time series. In the frequent pattern selection phase, we proposed an EGMAMC (Episode Generated Memory Aggregation Markov Chain) model to help us selecting episodes which can describe stream data significantly. Then we defined feature flows to represent the instances of discovered frequent patterns and categorized the distribution of frequent pattern instances into three categories according to the spectrum of their feature flows. At last, we proposed a clustering algorithm EDPA (Event Detection by Pattern Aggregation) to aggregate strongly correlated frequent patterns together. We argue that strongly correlated frequent patterns form events and frequent patterns in different categories can be aggregated to form different kinds of events. Experiments on real-world sensor network datasets demonstrate that the proposed MNOE algorithm is more efficient than the existing non-overlapping episode mining algorithm and EDPA performs better when the input frequent patterns are maximal, significant and non-overlapping.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Li Wan, Jianxin Liao, Xiaomin Zhu, "Mining Sequential Episodes from Segmentation of Multivariate Time Series", the proceedings of DMIN 2009 (to be appear).
|
| |
2
|
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," Proc. 17th Int'l Conf. Data Eng., pp. 215--224, 2001.
|
| |
3
|
N. Meger and C. Rigotti. Constraint-based mining of episode rules and optimal window sizes. In J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, editors, Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'04), pages 313--324. Springer, 2004.
|
| |
4
|
G. Cormode, S. Muthukrishnan What's hot and what's not: tracking most frequent items dynamically. PODS 2003: 296--306
|
| |
5
|
C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities, in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT, 2003.
|
| |
6
|
Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, Accepted as a chapter in the forthcoming book Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook.
|
| |
7
|
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 2002.
|
| |
8
|
Joao Gama, Pedro Pereira Rodrigues, Raquel Sebastiao, Evaluating Algorithms that Learn from Data Streams. The proceeding of SAC'09, 2009
|
| |
9
|
Eduardo J. Spinosa, Joao Gama, Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks, the proceeding of SAC'08, 2008.
|
| |
10
|
Yixin Chen, Li Tu, Density-Based Clustering for Real-Time Stream Data, the proceedings of kdd'07, 2007.
|
| |
11
|
Joao Gama, Mohamed Medhat Gaber, Learning from data streams: processing techniques in sensor networks. Springer 2007.
|
| |
12
|
Auroop R. Ganguly, Joao Gama, Olufemi A. Omitaomu, Mohamed Medhat Gaber, Ranga Raju Vatsavai (Editor), Knowledge Discovery from Sensor Data (Industrial Innovation), CRC, 2008
|
| |
13
|
Mohamed Medhat Gaber, Arkady Zaslavsky, Shonali Krishnaswamy, Mining Data Streams: A Review, SIGMOD Record, Vol. 34, No. 2, Page:18--26, 2005
|
| |
14
|
Han J, Kamber M 2001 Data mining: Concepts and techniques (San Fransisco, CA: Morgan Kauffmann)
|
| |
15
|
S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. Discovering frequent episodes and learning Hidden Markov Models: A formal connection. IEEE Transactions on Knowledge and Data Engineering, 17(11):1505--1517, Nov. 2005.
|
| |
16
|
S. Laxman, Stream Prediction Using A Generative Model Based On Frequent Episodes In Event Sequences Proceeding of KDD 2008
|
| |
17
|
G. M. Weiss and H. Hirsh. Learning to predict rare events in event sequences. In proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 98), pages 359--363, New York City, NY, USA, Aug. 27--31 1998.
|
| |
18
|
Wren, C. R.; Ivanov, Y. A.; Leigh, D.; Westhues, J., "The MERL Motion Detector Dataset", Workshop on Massive Datasets (MD), ISBN: 978-1-50593-981-8, pp. 10--14, November 2007
|
| |
19
|
db.csail.mit.edu/labdata/labdata.html.
|
| |
20
|
Jessica Lin, Eamonn Keogh, Stefano Lonardi, Bill Chiu. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms, In Proceeding of DMKD, 2003
|
| |
21
|
Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, etc TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Proceedings of CIDR 2003 Conference
|
| |
22
|
D. Abadi, D. Carney, U. Cetintemel, etc. Aurora: A New Model and Architecture for Data Stream Management. In VLDB Journal (12)2: 120--139, 2003
|
| |
23
|
D. Abadi, D. Carney, U. Cetintemel, etc. Aurora: A Data Stream Management System (Demonstration). In proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03), San Diego, CA, June 2003.
|
| |
24
|
Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, etc, The Design of the Borealis Stream Processing Engine, 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05), Asilomar, CA, January 2005.
|
| |
25
|
R. Motwani, J. Widom, A. Arasu, etc. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Proc. of CIDR 2003, Jan. 2003
|
|