| PGG: an online pattern based approach for stream variation management |
| Source
|
Journal of Computer Science and Technology
archive
Volume 23 , Issue 4 (July 2008)
table of contents
Pages 497-515
Year of Publication: 2008
ISSN:1000-9000
|
|
Authors
|
|
Lu-An Tang
|
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Bin Gui
|
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies, Peking University, Beijing, China
|
|
Hong-Yan Li
|
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Beijing, China
|
|
Gao-Shan Miao
|
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Beijing, China
|
|
Dong-Qing Yang
|
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies, Peking University, Beijing, China
|
|
Xin-Biao Zhou
|
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Beijing, China
|
|
| Publisher |
Institute of Computing Technology
Beijing, China
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 0
|
|
|
ABSTRACT
Many database applications require efficient processing of data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge of underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This feature, so called pseudo periodicity, poses a new challenge to stream variation management. This study focuses on the online management for variations over such streams. The idea can be applied to many scenarios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with following features: 1) adopts the wave-pattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wave-pattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence substantially compresses the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Charu C. Aggarwal , Jiawei Han , Jianyong Wang , Philip S. Yu, A framework for projected clustering of high dimensional data streams, Proceedings of the Thirtieth international conference on Very large data bases, p.852-863, August 31-September 03, 2004, Toronto, Canada
|
 |
3
|
|
 |
4
|
Brain Babcock , Mayur Datar , Rajeev Motwani , Liadan O'Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.234-243, June 09-11, 2003, San Diego, California
[doi> 10.1145/773153.773176]
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
Varon Joseph, Marik PE. Clinical information systems and the electronic medical record in the intensive care unit. Current Option in Critical Care, 2002, 8(6): 616-624.
|
| |
10
|
|
 |
11
|
Lv-an Tang , Bin Cui , Hongyan Li , Gaoshan Miao , Dongqing Yang , Xinbiao Zhou, Effective variation management for pseudo periodical streams, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
[doi> 10.1145/1247480.1247511]
|
| |
12
|
|
 |
13
|
|
| |
14
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
 |
15
|
Corinna Cortes , Kathleen Fisher , Daryl Pregibon , Anne Rogers, Hancock: a language for extracting signatures from data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.9-17, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347094]
|
 |
16
|
Sirish Chandrasekaran , Owen Cooper , Amol Deshpande , Michael J. Franklin , Joseph M. Hellerstein , Wei Hong , Sailesh Krishnamurthy , Samuel R. Madden , Fred Reiss , Mehul A. Shah, TelegraphCQ: continuous dataflow processing, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
[doi> 10.1145/872757.872857]
|
| |
17
|
|
 |
18
|
|
| |
19
|
Graham Cormode , Mayur Datar , Piotr Indyk , S. Muthukrishnan, Comparing data streams using Hamming norms (how to zero in), Proceedings of the 28th international conference on Very Large Data Bases, p.335-345, August 20-23, 2002, Hong Kong, China
|
| |
20
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
21
|
Hu Z, Li H, Qiu B, Tang L, Fan Y, Liu H, Gao J, Zhou X. Using control theory to guide load shedding in medical data stream management system. In Proc. the 10th Asian Computing Science Conference, Advances in Computer Science, 2005, Kunming, China, LNCS 3818, pp. 236-248.
|
 |
22
|
|
| |
23
|
David J Fraenkel, Melleesa Cowie, Peter Daley. Quality benefits of an intensive care clinical information system. Crit. Care Medi., 2003, 31: 120-125.
|
| |
24
|
Axel Junger, Achim Michel et al. Evaluation of the suitability of a patient data management system for ICUs on a general ward. International Journal of Medical Informatics, 2001, 64: 57-66.
|
| |
25
|
Liu Y B, Cai J R, Yin J, Fu W A. Clustering text data streams. Journal of Computer Science and Technology, Jan. 2008, 23(1): 112-128.
|
| |
26
|
|
| |
27
|
Chong Z H, Yu J X, Zhang Z J, Lin X M, Wang W, Zhou A Y. Efficient computation of k-medians over data streams under memory constraints. Journal of Computer Science and Technology, Mar. 2006, 21(2): 284-296.
|
| |
28
|
Chang J H, Lee W S. Effect of count estimation in finding frequent itemsets over online transactional data streams. Journal of Computer Science and Technology, Jan. 2005, 20(1): 63-69.
|
 |
29
|
Y. Dora Cai , David Clutter , Greg Pape , Jiawei Han , Michael Welge , Loretta Auvil, MAIDS: mining alarming incidents from data streams, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007695]
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
Aggarwal C C. On abnormality detection in spuriously populuted data streams. In Proc. SIAM International Conference on Data Mining, Newport Beach, CA, USA, 2005.
|
 |
34
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
[doi> 10.1145/882082.882086]
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
Wu H, Sharp G, Salzberg B, Kaeli D, Shirato H, Jiang S. A finite state model for respiratory motion analysis in image guided radiation therapy. Physics in Medicine and Biology (PMB), 2004, 49(23): 5357-5372.
|
 |
40
|
Huanmei Wu , Betty Salzberg , Gregory C Sharp , Steve B Jiang , Hiroki Shirato , David Kaeli, Subsequence matching on structured time series data, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066235]
|
 |
41
|
|
| |
42
|
Wang H, Pei J. A random method for quantifying changing distributions in data streams. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Porto, Portugal, October 2005, pp. 684-691.
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
|