|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
This paper takes a new view of motif discovery, addressing a common problem in existing motif finders. A motif is treated as a feature of the input promoter regions that leads to a good classifier between these promoters and a set of background promoters. This perspective allows us to adapt existing methods of feature selection, a well studied topic in machine learning, to motif discovery. We develop a general algorithmic framework that can be specialized to work with a wide variety of motif models, including consensus models with degenerate symbols or mismatches, and composite motifs. A key feature of our algorithm is that it measures over-representation while maintaining information about the distribution of motif instances in individual promoters. The assessment of a motif's discriminative power is normalized against chance behaviour by a probabilistic analysis. We apply our framework to two popular motif models, and are able to detect several known binding sites in sets of co-regulated genes in yeast.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. I. Arnone and E. H. Davidson. The hardwiring of development: organization and function of genomic regulatory systems. Development, 124:1851--1864, 1997.
|
| |
2
|
|
| |
3
|
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559--584, 2000.
|
| |
4
|
W. N. Grundy, T. L. Bailey, C. P. Elkan, and M. E. Baker. Meta-meme: Motif-based hidden markov models of protein families. Computer Applications in the Biosciences, 13(4):397--406, 1997.
|
| |
5
|
D. GuhaThakurta and G. D. Stormo. Identifying target sites for cooperatively binding factors. In RECOMB01: Proceedings of the Fifth Annual International Conference on Computational Molecular Biology, Montreal, Canada, Apr. 2001.
|
| |
6
|
G. Z. Hertz and G. D. Stormo. Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In H. A. Lim and C. R. Cantor, editors, Proceedings of the Third International Conference on Bioinformatics and Genome Research, pages 201--216. World Scientific Publishing Co., Ltd., Singapore, 1995.
|
| |
7
|
Y.-J. Hu, S. Sandmeyer, C. McLaughlin, and D. Kibler. Combinatorial motif analysis and hypothesis generation on a genomic scale. Bioinformatics, 16(3):222--232, 2000.
|
| |
8
|
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 8 October 1993.
|
 |
9
|
|
| |
10
|
P. Nicodème, B. Salvy, and P. Flajolet. Motif statistics. Technical Report RR-3606, INRIA Rocquencourt, Jan. 1999.
|
| |
11
|
Y. Ohmori, R. D. Schreiber, and T. A. Hamilton. Synergy between interferon-gamma and tumor necrosis factor alpha in transcriptional activation is mediated by cooperation between signal transducer and activator of transcription 1 and nuclear factor kappa b. The Journal of Biological Chemistry, pages 14899--14907, 1997.
|
| |
12
|
P. Pavlidis, T. Furey, M. Liberto, D. Haussler, and W. Grundy. Promoter region-based classification of genes. Pacific Symposium on Biocomputing, 2000.
|
| |
13
|
F. P. Roth, J. D. Hughes, P. W. Estep, and G. M. Church. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome {mRNA} quantitation. Nature Biotechnology, 16:939--945, Oct. 1998.
|
| |
14
|
|
| |
15
|
|
| |
16
|
J. van Helden, B. André, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology, 281(5):827--842, Sept. 4 1998.
|
| |
17
|
A. Wagner. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics, 15(10):776--784, 1999.
|
| |
18
|
M. S. Waterman. Introduction to Computational Biology. Chapman & Hall, 1995.
|
| |
19
|
J. Zhu and M. Q. Zhang. SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15(7/8):563--577, July/August 1999. http://cgsigma.cshl.org/jian/.
|
|