ACM Home Page
Please provide us with feedback. Feedback
A new protein motif extraction framework based on constrained co-clustering
Full text PdfPdf (806 KB)
Source
Symposium on Applied Computing archive
Proceedings of the 2009 ACM symposium on Applied Computing table of contents
Honolulu, Hawaii
SESSION: Bioinformatics track table of contents
Pages 776-781  
Year of Publication: 2009
ISBN:978-1-60558-166-8
Authors
Francesca Cordero  University of Torino, Torino, Italy
Alessia Visconti  University of Torino, Torino, Italy
Marco Botta  University of Torino, Torino, Italy
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 43,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1529282.1529445
What is a DOI?

ABSTRACT

Signal finding (pattern discovery) in biological sequences is a fundamental problem in both computer science and molecular biology. Many approaches have been proposed for extracting interesting patterns (or motifs) from DNA/RNA and protein sequences. Some approaches are based on simple and multiple alignment techniques, some use biological knowledge and others do not.

In this paper, we propose a de novo framework that performs motifs identification and exploits a constrained co-clustering technique allowing one to simultaneously find associations between groups of protein sequences and groups of motifs.

We show that the presented approach is able to group together protein sequences belonging to the same families and, at the same time to provide a set of characterizing motifs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bairoch. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res, 19: 2241--2245, 1991.
 
2
 
3
 
4
H. Cho, I. Dillon, Y. Guan, and S. Sra. Minimun sum-square residue co-clustering of gene expression data. In Proceedings SIAM SDM 2004, 2004.
 
5
G. Crooks, G. Hon, J. Chandonia, and S. Brenner. WebLogo: A sequence logo generator. Genome Research, 14: 1188--1190, 2004.
6
 
7
M. Dogǧruel, T. Down, and T. J. Hubbard. NestedMICA as an ab initio protein motif discovery tool. BMC Bioinformatics, 14: 9--19, 2008.
 
8
H. Dyson and P. Wright. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol, 12: 54--60, 2002.
 
9
R. Edwards, N. Davey, and D. Shields. SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS ONE, 2: 967--978, 2007.
 
10
R. Finn, J. Mistry, B. Schuster-Bckler, S. Griffiths-Jones, V. Hollich, T. Lassmann, S. Moxon, M. Marshall, A. Khanna, R. Durbin, S. Eddy, E. Sonnhammer, and A. Bateman. Pfam: clans, web tools and services. Nucleic Acids Res, 34: 247--251, 2006.
 
11
 
12
J. A. Hartigan. Direct clustering of a data matrix. Journal of American Statistical Association, 67: 123--129, 1972.
 
13
S. Kim, Z. Wang, and M. Dalkilic. iGibbs: Improving Gibbs Motif Sampler for Proteins by Sequence Clustering and Iterative Pattern Sampling. PROTEINS: Structure, Function, and Bioinformatics, 66: 671--681, 2007.
 
14
 
15
I. Letunic, R. Copley, B. Pils, S. Pinkert, J. Schultz, and P. Bork. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res, 34: 257--260, 2006.
 
16
K. Mukherjee and T. Burglin. Comprehensive analysis of animal TALE homeobox genes: New conserved motifs and cases of accelerated evolution. J. Mol. Evol., 65: 137--153, 2007.
 
17
N. Mulder, R. Apweiler, T. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bork, V. Buillard, L. Cerutti, R. Copley, E. Courcelle, U. Das, L. Daugherty, M. Dibley, R. Finn, W. Fleischmann, J. Gough, D. Haft, N. Hulo, S. Hunter, D. Kahn, A. Kanapin, A. Kejariwal, A. Labarga, P. Langendijk-Genevaux, D. Lonsdale, R. Lopez, I. Letunic, M. Madera, J. Maslen, C. McAnulla, J. McDowall, J. Mistry, A. Mitchell, A. Nikolskaya, S. Orchard, C. Orengo, R. Petryszak, J. Selengut, C. Sigrist, P. Thomas, F. Valentin, D. Wilson, C. Wu, and C. Yeats. New developments in the InterPro database. Nucleic Acids Res, 35: 224--228, 2007.
 
18
F. Nollet, P. Kools, and F. van Roy. Phylogenetic analysis of the cadherin superfamily allows identification of six major subfamilies besides several solitary members. J Mol Biol., 299: 551--572, 2000.
 
19
M. Noyes, R. Christensen, A. Wakabayashi, G. Stormo, M. Brodsky, and S. Wolfe. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell, 27: 1277--1289, 2008.
 
20
R. Pensa and J.-F. Boulicaut. Constrained Co-clustering of Gene Expression Data. In Proceedings SIAM SDM 2008, pages 25--36, 2008.
 
21
R. Pensa, J.-F. Boulicaut, and M. Atzori. Co-clustering Numerical Data under User-defined Constraints. Technical report, Dept computer science - University od Pisa, 2008.
 
22
E. Perrodou, C. Chica, O. Poch, T. Gibson, and J. Thompson. A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinformatics, 9: 213--228, 2008.
 
23
I. Rigoutsos and A. Floratos. A combinatorial pattern discovery in biological sequences: the TEIRESIS algorithm. Bioinformatics, 16: 55--67, 1998.
 
24
J. Skilling. Nested Sampling. Technical report, American Institute of Physics Conference Series, 2004.

Collaborative Colleagues:
Francesca Cordero: colleagues
Alessia Visconti: colleagues
Marco Botta: colleagues