|
ABSTRACT
The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Alberto Apostolico , Mary Ellen Bock , Stefano Lonardi, Monotony of surprise and large-scale quest for unusual words, Proceedings of the sixth annual international conference on Computational biology, p.22-31, April 18-21, 2002, Washington, DC, USA
[doi> 10.1145/565196.565200]
|
| |
2
|
A. Apostolico, M. E. Bock, S. Lonardi, and X. Xu. Efficient detection of unusual words. J. Comput. Bio., 7(1/2):71--94, Jan. 2000.
|
| |
3
|
|
| |
4
|
|
| |
5
|
G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule discovery from time series. In Proc. the 4th International Conference of Knowledge Discovery and Data Mining, pages 16--22. AAAI Press, 1998.
|
| |
6
|
D. Dasgupta and S. Forrest. Novelty detection in time series data using ideas from immunology. In Proc. of The International Conference on Intelligent Systems, 1999.
|
| |
7
|
C. S. Daw, C. E. A. Finney, and E. R. Tracy. Symbolic analysis of experimental data. Review of Scientific Instruments 2001, Oct. 30--31 2001.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
W. Feller. An introduction to Probability Theory and its Applications. Wiley, New York, 1968.
|
 |
12
|
|
| |
13
|
|
| |
14
|
D. M. Hawkins. Identification of Outliers, Monographs on Applied Probability & Statistics. Chapman and Hall, London, 1980.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
E. Keogh and M. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proc. 4th International Conference on Knowledge Discovery and Data Mining, pages 239--241, 1998.
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
G. Reinert, S. Schbath, and M. S. Waterman. Probabilistic and statistical properties of words: An overview. J. Comput. Bio., 7:1--46, 2000.
|
| |
24
|
|
| |
25
|
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
|
| |
26
|
|
| |
27
|
P. Weiner. Linear pattern matching algorithm. In Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1--11, Washington, DC, 1973.
|
| |
28
|
B. Whitehead and W. A. Hoyt. A function approximation approach to anomaly detection in propulsion system test data. In Proc. AIAA/SAE/ASME/ASEE 29th Joint Propulsion Conference, Monterey, CA, June 1993.
|
| |
29
|
T. Yairi, Y. Kato, and K. Hori. Fault detection by mining association rules from house-keeping data. In Proc. of International Symposium on Artificial Intelligence, Robotics and Automation in Space, 2001.
|
CITED BY 32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Jeffrey P. Lankford , Donna M. Nystrom, Visually mining and monitoring massive time series, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lu-An Tang , Bin Gui , Hong-Yan Li , Gao-Shan Miao , Dong-Qing Yang , Xin-Biao Zhou, PGG: an online pattern based approach for stream variation management, Journal of Computer Science and Technology, v.23 n.4, p.497-515, July 2008
|
|
|
|
|
|
Marcel Karnstedt , Daniel Klan , Christian Pölitz , Kai-Uwe Sattler , Conny Franke, Adaptive burst detection in a stream engine, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|
|
|
|
|
|
|
|
|
|