| Measuring text similarity with dynamic time warping |
| Full text |
Pdf
(164 KB)
|
Source
|
ACM International Conference Proceeding Series; Vol. 299
archive
Proceedings of the 2008 international symposium on Database engineering & applications
table of contents
Coimbra, Portugal
SESSION: Data mining, OLAP, and knowledge discovery
table of contents
Pages 263-267
Year of Publication: 2008
ISBN:978-1-60558-188-0
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 124, Citation Count: 0
|
|
|
ABSTRACT
In this work, we describe an approach which aims to make typed texts comparable with temporal data mining methods. This proposal was made in earlier work [11], but to our knowledge no significant research on this subject has been done yet. The basic idea is to derive artificial time series from texts by counting the occurrences of relevant keywords in a sliding window applied to them, and these time series can be compared with techniques of time series analysis. In this particular case the Dynamic Time Warping distance [3] was used. By extensive testing adequate parameters for time series calculation were derived, and we show that this approach might aid in the recognition of similar texts since the observed distances between similar documents are significantly lower than those between unrelated texts. Our idea might also be especially suitable for comparison in different languages since only the keyword translations must be known.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
| |
2
|
|
| |
3
|
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, pages 359--370, 1994.
|
| |
4
|
S. Chu, E. Keogh, D. Hart, and M. Pazzani. Iterative deepening dynamic time warping for time series. In In Proc 2 na SIAM International Conference on Data Mining, 2002.
|
| |
5
|
P. Iyer and A. Singh. Document similarity analysis for a plagiarism detection system. In IICAI, pages 2534--2544, 2005.
|
 |
6
|
|
| |
7
|
E. Keogh and M. Pazzani. Derivative dynamic time warping. In First SIAM International Conference on Data Mining, (Chicago, IL)., 2001.
|
 |
8
|
|
 |
9
|
|
| |
10
|
M. Matuschek. Temporal Aspects in Data Mining (Master Thesis). Heinrich-Heine-Universität, Düsseldorf, 2008.
|
| |
11
|
C. Ratanamahatana and E. Keogh. Everything you know about dynamic time warping is wrong. In Third Workshop on Mining Temporal and Sequential Data, in conjunction with the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), 2004.
|
| |
12
|
T. Rath and R. Manmatha. Word image matching using dynamic time warping. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), 2003.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
X. Xi, E. Keogh, L. Wei, and A. Mafra-Neto. Finding motifs in a database of shapes. In SDM, 2007.
|
| |
19
|
|
|