|
ABSTRACT
Sequence-data mining plays a key role in many scientific studies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the type of query, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a <i>valid kernel</i> as the distance function to measure <i>similarity</i> between sequences. For mapping, we then find the best combination of the individual distance functions, which measure similarity of different views, to depict the target semantics. We propose a <i>super-kernel function-fusion</i> scheme to achieve the optimal mapping. Through theoretical analysis and empirical studies on UCI and real world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Tolga Bozkaya , Nasser Yazdani , Meral Özsoyoğlu, Matching and indexing sequences of different lengths, Proceedings of the sixth international conference on Information and knowledge management, p.128-135, November 10-14, 1997, Las Vegas, Nevada, United States
[doi> 10.1145/266714.266880]
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
G. Chartrand. Introductory graph theory. New York: Dover, 1985.
|
| |
10
|
L. Chen, M. Tamer, and V. Oria. Symbolic representation and retrieval of moving object trajectories. University of Waterloo School of Computer Science Waterloo, Canada, Technical Report CS-2003-30., 2003.
|
| |
11
|
C. S. Daw, C. E. A. Finney, and E. R. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74(2), 2003.
|
 |
12
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
13
|
|
| |
14
|
L. Hammel and J. Patel. Searching on the secondary structure of protein sequences. Proceedings of the 28th VLDB Conference, 2002.
|
| |
15
|
T. Hishiki, N. Collier, C. Nobata, T. Ohta, N. Ogata, T. Sekimizu, R. Steiner, H. Park, and J. Tsujii. Developing nlp tools for genome informatics: An information extraction perspective. In Genome Informatics. Universal Academy Press, Inc., Tokyo, Japan, 1998., 1998.
|
 |
16
|
|
| |
17
|
Y. Huhtala, J. Karkkainen, and H. Toivonen. Mining for similarities in aligned time series using wavelets. Data Mining and Knowlege Discovery: Theory, Tools, and Technology, SPIE Proceeding Series, 1999.
|
| |
18
|
|
| |
19
|
|
| |
20
|
H. JA and M. BJ. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1), 1982.
|
 |
21
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
 |
22
|
|
| |
23
|
|
 |
24
|
Flip Korn , H. V. Jagadish , Christos Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.289-300, May 11-15, 1997, Tucson, Arizona, United States
|
| |
25
|
G. R. G. Lanckriet, M. H. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble. Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing, 2004.
|
 |
26
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
[doi> 10.1145/882082.882086]
|
| |
27
|
V. Moulton, M. Zuker, M. Steel, R. Pointon, and D. Penny. Metrics on rna secondary structures. Journal of Computational Biology, 2000.
|
| |
28
|
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 1970.
|
 |
29
|
|
| |
30
|
V. Roth, J. Laub, J. Buhmann, and K.-R. Muller. Going metric: Denoising pairwise data. In Neural Information Processing Systems (NIPS), 2002.
|
| |
31
|
V. Roth and V. Steinhage. Nonlinear discriminant analysis using kernel functions. NIPS, 1999.
|
| |
32
|
B. Scholkopf and A. Smola. Learning with kernels. MIT Press, 2001.
|
| |
33
|
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of molecular biology, 1981.
|
| |
34
|
|
| |
35
|
C. Watkins. Dynamic alignment kernels. Technical Report CSD-TR-98-11, 1999.
|
 |
36
|
Gang Wu , Yi Wu , Long Jiao , Yuan-Fang Wang , Edward Y. Chang, Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
[doi> 10.1145/957013.957126]
|
| |
37
|
Y. Wu, C.-Y. Lin, E. Y. Chang, and J. R. Smith. Multimodal kernel fusion for news video concept detection. IEEE International Conference on Image Processing (ICIP), 2004.
|
CITED BY 2
|
|
|
Xiaopeng Xi , Eamonn Keogh , Christian Shelton , Li Wei , Chotirat Ann Ratanamahatana, Fast time series classification using numerosity reduction, Proceedings of the 23rd international conference on Machine learning, p.1033-1040, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|