ACM Home Page
Please provide us with feedback. Feedback
Permu-pattern: discovery of mutable permutation patterns with proximity constraint
Full text PdfPdf (377 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 318-326  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Meng Hu  Case Western Reserve University, Cleveland, OH, USA
Jiong Yang  Case Western Reserve University, Cleveland, OH, USA
Wei Su  Case Western Reserve University, Cleveland, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 176,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401932
What is a DOI?

ABSTRACT

Pattern discovery in sequences is an important problem in many applications, especially in computational biology and text mining. However, due to the noisy nature of data, the traditional sequential pattern model may fail to reflect the underlying characteristics of sequence data in these applications. There are two challenges: First, the mutation noise exists in the data, and therefore symbols may be misrepresented by other symbols; Secondly, the order of symbols in sequences could be permutated. To address the above problems, in this paper we propose a new sequential pattern model called mutable permutation patterns. Since the Apriori property does not hold for our permutation pattern model, a novel Permu-pattern algorithm is devised to mine frequent mutable permutation patterns from sequence databases. A reachability property is identified to prune the candidate set. Last but not least, we apply the permutation pattern model to a real genome dataset to discover gene clusters, which shows the effectiveness of the model. A large amount of synthetic data is also utilized to demonstrate the efficiency of the Permu-pattern algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bergeron, J. Stoye. On the Similarity of Sets of Permutations and Its Applications to Genome Comparison. COCOON 2003.
 
2
R. Eres, G.M. Landau, L. Parida. Permutation Pattern Discovery in Biosequences. Journal of Computational Biology, 2004, 11(6):1050--1060. doi:10.1089/cmb.2004.11.1050.
 
3
 
4
BLAST, available at http://ncbi.nih.gov/BLAST/.
5
 
6
G. Deckers-Hebestreit, K. Altendorf. THE F0F1-TYPE ATP SYNTHASES OF BACTERIA: Structure and Function of the F0 Complex. Annual Review of Microbiology, Vol. 50: 791--824.
 
7
R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
 
8
C. Fellbaum. WordNet: An Electronic Lexical Database. The MIT Press, 1998.
 
9
 
10
D.Graur, W.H. Li. Fundamentals of Molecular Evolution. Ed. Sinauer Associates, Inc., 1991.
 
11
J. Han, J. Pei. Mining Frequent Patterns by Pattern--growth: Methodology and Implications. KDD, 2000.
 
12
 
13
M. Hu, J. Yang, and W. Su. Permu--pattern: Discovery of Mutable Permutation Patterns with Proximity Constraint. CWRU EECS Dept. Tech. Report, 2007.
 
14
M. Joshi, G. Karypis, V. Kumar. A Universal Formulation of Sequential Patterns. KDD workshop on Temporal Data Mining, 2001.
 
15
G.M. Landau, L. Parida, O. Weimann. Gene Proximity Analysis Across Whole Genomes via PQ Trees. Journal of Computational Biology, 12(10), pp 1289---1306, 2005.
 
16
R. Overbeek, M. Fonstein, M. D'Souza, G. D. Pusch, N. Maltsev. The Use of Gene Clusters to Infer Functional Coupling. Proc. Natl. Acad. Sci. U.S.A., 96(6):2896´lC2901, 1999.
 
17
18
 
19
 
20
R. Rymon. Search Through Systematic Set Enumeration. In Int'l. Conf. on Principles of Knowledge Representation and Reasoning, 1992.
 
21
 
22
T. Schmidt, J. Stoye. Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences. CPM, 2004
 
23
Roman L. Tatusov, Natalie D. Fedorova, John D. Jackson, Aviva R. Jacobs, Boris Kiryutin, Eugene V. Koonin, Dmitri M. Krylov, Raja Mazumder, Sergei L. Mekhedov, Anastasia N. Nikolskaya, B. Sridhar Rao, Sergei Smirnov, Alexander V. Sverdlov, Sona Vasudevan, Yuri I. Wolf, Jodie J. Yin, Darren A. Natale. The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 2003, 4:41
24
 
25
26
 
27