|
ABSTRACT
We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequences into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree [9]. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
5
|
|
| |
6
|
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 1990.
|
| |
7
|
Manish Arya. William Cody. Christos Faloutsos. Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.
|
| |
8
|
|
 |
9
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
10
|
Christopher Chatfield. The Analysis of Time Series: an Introduction. Chapman and Hall, London & New York, 1984. Third Edition.
|
| |
11
|
Mathematical Committeeon Physical and NSF Engineering Sciences. Grand Challenges: lIigh Performance Computing and Communications. National Science Foundation, 1992. The FY 1992 U.S. Research and Development Program.
|
| |
12
|
Robert D. Edwards and John Magee. Technical Analysis of Stock Trends. John Magee, Springfield, Massachusetts, 1966. 5th Edition, second printing.
|
 |
13
|
|
| |
14
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
 |
15
|
|
| |
16
|
Richard Wesley Hamming. Digital Filters. Prentice-Hall Signal Processing Series, Englewood Cliffs, N.J., 1977.
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
B. Mandelbrot. Fraetal Geometry of Nature. W.H. Freeman, New York, 1977.
|
| |
21
|
Wayne Niblaek, Ron Barber, Will Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovie, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. The qbie project: Querying images by content using color, texture and shape. SPIE 1993 Intl. Symposium on Electronic Imaging: Science and Technology, Conf. 1908, Storage and Retrieval for Image and Video Databases, February 1993. Also available as IBM Research Report ILl 9203 (81511), Feb. 1, 1993, Computer Science.
|
 |
22
|
|
| |
23
|
Alan Victor Oppenheim and Ronald W. Schafer. Digital Signal Processing. Prentice-Hall, Englewood Cliffs, N.J., 1975.
|
 |
24
|
|
| |
25
|
Mary Beth Ruskai, Gregory Beylkin, Ronald Coifman, Ingrid Daubeehies, Stephane Mallat, Yves Meyer, and Louise Raphael. Wavelets and Their Applications. Jones and Bartlett Publishers, Boston, MA, 1992.
|
| |
26
|
|
| |
27
|
Manfred Sehroeder. Fractals, Chaos, Power Laws: Minutes From an lnfinite Paradise. W.H. Freeman and Company, New York, 1991.
|
| |
28
|
|
| |
29
|
R. Stam and Richard Snodgrass. A bibliography on temporal databases. IEEE Bulletin on Data Engineering, 11(4), December 1988.
|
| |
30
|
Dimitris Vassiliadis. The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables. Int. Workshop on Applications of Artificial Intelligence in Solar Terrestrial Physics, September 1993.
|
 |
31
|
|
CITED BY 246
|
|
|
|
|
Woong-Kee Loh , Sang-Wook Kim , Kyu-Young Whang, Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.480-487, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi-Leh Wu , Divyakant Agrawal , Amr El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.488-495, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chung-Sheng Li , Philip S. Yu , Vittorio Castelli, MALM: a framework for mining sequence database at multiple abstraction levels, Proceedings of the seventh international conference on Information and knowledge management, p.267-272, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
Daniel Wu , Ambuj Singh , Divyakant Agrawal , Amr El Abbadi , Terence R. Smith, Efficient retrieval for browsing large image databases, Proceedings of the fifth international conference on Information and knowledge management, p.11-18, November 12-16, 1996, Rockville, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tolga Bozkaya , Nasser Yazdani , Meral Özsoyoğlu, Matching and indexing sequences of different lengths, Proceedings of the sixth international conference on Information and knowledge management, p.128-135, November 10-14, 1997, Las Vegas, Nevada, United States
|
|
|
|
|
|
|
|
|
Kien A. Hua , Khanh Vu , Jung-Hwan Oh, SamMatch: a flexible and efficient sampling-based image retrieval technique for large image databases, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.225-234, October 30-November 05, 1999, Orlando, Florida, United States
|
|
|
|
|
|
|
|
|
Vikrant Kobla , David Doermann , Christos Faloutsos, VideoTrails: representing and visualizing structure in video sequences, Proceedings of the fifth ACM international conference on Multimedia, p.335-346, November 09-13, 1997, Seattle, Washington, United States
|
|
|
Yunyao Qu , Changzhou Wang , X. Sean Wang, Supporting fast search in time series for movement patterns in multiple scales, Proceedings of the seventh international conference on Information and knowledge management, p.251-258, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
Martin Gavrilov , Dragomir Anguelov , Piotr Indyk , Rajeev Motwani, Mining the stock market (extended abstract): which measure is best?, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.487-496, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chuanjun Li , Peng Zhai , Si-Qing Zheng , Balakrishnan Prabhakaran, Segmentation and recognition of multi-attribute motion sequences, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
Sanghyun Park , Sang-Wook Kim , June-Suh Cho , Sriram Padmanabhan, Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. V. Jagadish , Alberto O. Mendelzon , Tova Milo, Similarity-based queries, Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.36-45, May 22-25, 1995, San Jose, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Béla Bollobás , Gautam Das , Dimitrios Gunopulos , Heikki Mannila, Time-series similarity problems and well-separated geometric sets, Proceedings of the thirteenth annual symposium on Computational geometry, p.454-456, June 04-06, 1997, Nice, France
|
|
|
Marlon Dumas , Marie-Christine Fauvet , Pierre-Claude Scholl, Handling temporal grouping and pattern-matching queries in a temporal object model, Proceedings of the seventh international conference on Information and knowledge management, p.424-431, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans-Peter Kriegel , Stefan Brecheisen , Peer Kröger , Martin Pfeifle , Matthias Schubert, Using sets of feature vectors for similarity search on voxelized CAD objects, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
Michail Vlachos , Carlotta Domeniconi , Dimitrios Gunopulos , George Kollios , Nick Koudas, Non-linear dimensionality reduction techniques for classification and visualization, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hakan Ferhatosmanoglu , Ertem Tuncel , Divyakant Agrawal , Amr El Abbadi, Vector approximation based indexing for non-uniform high dimensional data sets, Proceedings of the ninth international conference on Information and knowledge management, p.202-209, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Reza Sadri , Carlo Zaniolo , Amir Zarkesh , Jafar Adibi, Optimization of sequence queries in database systems, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.71-81, May 2001, Santa Barbara, California, United States
|
|
|
|
|
|
K. L. Liu , G. J. Lipovski , C. Yu , Naphtali Rishe, Efficient processing of one and two dimensional proximity queries in associative memory, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.138-146, August 18-22, 1996, Zurich, Switzerland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Jeffrey P. Lankford , Donna M. Nystrom, Visually mining and monitoring massive time series, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
Chuanjun Li , Gaurav Pradhan , S. Q. Zheng , B. Prabhakaran, Indexing of variable length multi-attribute motion data, Proceedings of the 2nd ACM international workshop on Multimedia databases, November 13-13, 2004, Washington, DC, USA
|
|
|
|
|
|
Alberto Lerner , Dennis Shasha , Zhihua Wang , Xiaojian Zhao , Yunyue Zhu, Fast algorithms for time series with applications to finance, physics, music, biology, and other suspects, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Ortega , Yong Rui , Kaushik Chakrabarti , Kriengkrai Porkaew , Sharad Mehrotra , Thomas S. Huang, Supporting Ranked Boolean Similarity Queries in MARS, IEEE Transactions on Knowledge and Data Engineering, v.10 n.6, p.905-925, November 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huanmei Wu , Betty Salzberg , Gregory C Sharp , Steve B Jiang , Hiroki Shirato , David Kaeli, Subsequence matching on structured time series data, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
T. Scott Saponas , Jonathan Lester , Carl Hartung , Sameer Agarwal , Tadayoshi Kohno, Devices that tell on you: privacy trends in consumer ubiquitous computing, Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, p.1-16, August 06-10, 2007, Boston, MA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony J. T. Lee , Chao-Wen Lin , Wen-Hsing Lo , Chieh-Chun Chen , Jia-Xin Chen, A novel filtration method in biological sequence databases, Pattern Recognition Letters, v.28 n.4, p.447-458, March, 2007
|
|
|
|
|
|
Eamonn Keogh , Themistoklis Palpanas , Victor B. Zordan , Dimitrios Gunopulos , Marc Cardle, Indexing large human-motion databases, Proceedings of the Thirtieth international conference on Very large data bases, p.780-791, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
Dimitris Papadias , Jun Zhang , Nikos Mamoulis , Yufei Tao, Query processing in spatial network databases, Proceedings of the 29th international conference on Very large data bases, p.802-813, September 09-12, 2003, Berlin, Germany
|
|
|
Yixin Chen , Guozhu Dong , Jiawei Han , Benjamin W. Wah , Jianyong Wang, Multi-dimensional regression analysis of time-series data streams, Proceedings of the 28th international conference on Very Large Data Bases, p.323-334, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Khanh Vu , Kien A. Hua , Hao Cheng , Sheau-Dong Lang, A non-linear dimensionality-reduction technique for fast similarity search in large databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Weidong Chen , Jyh-Herng Chow , You-Chin Fuh , Jean Grandbois , Michelle Jou , Nelson Mendonça Mattos , Brian T. Tran , Yun Wang, High Level Indexing of User-Defined Types, Proceedings of the 25th International Conference on Very Large Data Bases, p.554-564, September 07-10, 1999
|
|
|
|
|
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. S. Z. Belloum , E. C. Kaletas , A. W. Van Halderen , H. Afsarmanesh , L. O. Hertzberger , A. J. H. Peddemors, A Scalable Web Server Architecture, World Wide Web, v.5 n.1, p.5-23, 2002
|
|
|
Vassilis Athitsos , Panagiotis Papapetrou , Michalis Potamias , George Kollios , Dimitrios Gunopulos, Approximate embedding-based subsequence matching of time series, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiuxia Chen , Lei Chen , Xiang Lian , Yunhao Liu , Jeffrey Xu Yu, Indexable PLA for efficient similarity search, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
Yong Joon Lee , Jun Wook Lee , Duck Jin Chai , Bu Hyun Hwang , Keun Ho Ryu, Mining temporal interval relational rules from temporal data, Journal of Systems and Software, v.82 n.1, p.155-167, January, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abhishek Mukherji , Elke A. Rundensteiner , David C. Brown , Venkatesh Raghavan, SNIF TOOL: sniffing for patterns in continuous streams, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
E. Tiakas , A. N. Papadopoulos , A. Nanopoulos , Y. Manolopoulos , Dragan Stojanovic , Slobodanka Djordjevic-Kajan, Searching for similar trajectories in spatial networks, Journal of Systems and Software, v.82 n.5, p.772-788, May, 2009
|
|
|
|
|
|
Tiancheng Zhang , Dejun Yue , Yu Gu , Yi Wang , Ge Yu, Adaptive correlation analysis in stream time series with sliding windows, Computers & Mathematics with Applications, v.57 n.6, p.937-948, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You-Min Ha , Sanghyun Park , Sang-Wook Kim , Jung-Im Won , Jee-Hee Yoon, A stock recommendation system exploiting rule discovery in stock databases, Information and Software Technology, v.51 n.7, p.1140-1149, July, 2009
|
|
|
B. Aditya Prakash , Nicholas Valler , David Andersen , Michalis Faloutsos , Christos Faloutsos, BGP-lens: patterns and anomalies in internet routing updates, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
Sorabh Gandhi , Suman Nath , Subhash Suri , Jie Liu, GAMPS: compressing multi sensor data by grouping and amplitude scaling, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|