| General match: a subsequence matching method in time-series databases based on generalized windows |
| Full text |
Pdf
(1.87 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
table of contents
Madison, Wisconsin
SESSION: Research sessions: similarity and matching
table of contents
Pages: 382 - 393
Year of Publication: 2002
ISBN:1-58113-497-5
|
|
Authors
|
|
Yang-Sae Moon
|
Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea
|
|
Kyu-Young Whang
|
Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea
|
|
Wook-Shin Han
|
Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 74, Citation Count: 19
|
|
|
ABSTRACT
We generalize the method of constructing windows in subsequence matching. By this generalization, we can explain earlier subsequence matching methods as special cases of a common framework. Based on the generalization, we propose a new subsequence matching method, General Match. The earlier work by Faloutsos et al. (called FRM for convenience) causes a lot of false alarms due to lack of point-filtering effect. Dual Match, recently proposed as a dual approach of FRM, improves performance significantly over FRM by exploiting point filtering effect. However, it has the problem of having a smaller allowable window size---half that of FRM---given the minimum query length. A smaller window increases false alarms due to window size effect. General Match offers advantages of both methods: it can reduce window size effect by using large windows like FRM and, at the same time, can exploit point-filtering effect like Dual Match. General Match divides data sequences into generalized sliding windows (J-sliding windows) and the query sequence into generalized disjoint windows (J-disjoint windows). We formally prove that General Match is correct, i.e., it incurs no false dismissal. We then propose a method of estimating the optimal value of the sliding factor J that minimizes the number of page accesses. Experimental results for real stock data show that, for low selectivities (10-6∼10-4), General Match improves average performance by 117% over Dual Match and by 998% over FRM; for high selectivities (10-3∼10-1), by 45% over Dual Match and by 64% over FRM. The proposed generalization provides an excellent theoretical basis for understanding the underlying mechanisms of subsequence matching.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
 |
3
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
4
|
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegal, The pyramid-technique: towards breaking the curse of dimensionality, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.142-153, June 01-04, 1998, Seattle, Washington, United States
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
9
|
H. V. Jagadish , Alberto O. Mendelzon , Tova Milo, Similarity-based queries, Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.36-45, May 22-25, 1995, San Jose, California, United States
[doi> 10.1145/212433.212444]
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
K. Whang, G. Wiederhold, and D. Sagalowicz. Separability-an approach to physical database design. IEEE Trans. on Computers, c-33(3):209-222, 1984.
|
| |
17
|
A. C.-C. Yao. On random 2-3 trees. Acta Informatica, 9:159-170, 1978.
|
| |
18
|
|
CITED BY 19
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huanmei Wu , Betty Salzberg , Gregory C Sharp , Steve B Jiang , Hiroki Shirato , David Kaeli, Subsequence matching on structured time series data, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony J. T. Lee , Chao-Wen Lin , Wen-Hsing Lo , Chieh-Chun Chen , Jia-Xin Chen, A novel filtration method in biological sequence databases, Pattern Recognition Letters, v.28 n.4, p.447-458, March, 2007
|
|
|
|
|
|
Vassilis Athitsos , Panagiotis Papapetrou , Michalis Potamias , George Kollios , Dimitrios Gunopulos, Approximate embedding-based subsequence matching of time series, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|