| A Million-Fold Speed Improvement in Genomic Repeats Detection |
| Full text |
Pdf
(241 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
table of contents
Page: 20
Year of Publication: 2003
ISBN:1-58113-695-1
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 22, Citation Count: 0
|
|
|
ABSTRACT
This paper presents a novel, parallel algorithm for generating top alignments. Top alignments are used for finding internal repeats in biological sequences like proteins and genes. Our algorithm replaces an older, sequential algorithm (Repro), which was prohibitively slow for sequence lengths higher than 2000. The new algorithm is an order of magnitude faster (O(n3) rather than O(n4)). The paper presents a three-level parallel implementation of the algorithm: using SIMD multimedia extensions found on present-day processors (a novel technique that can be used to parallelize any application that performs many sequence alignments), using shared-memory parallelism, and using distributed-memory parallelism. It allows processing the longest known proteins (nearly 35000 amino acids). We show exceptionally high speed improvements: between 548 and 889 on a cluster of 64 dual-processor machines, compared to the new sequential algorithm. Especially for long sequences, extreme speed improvements over the old algorithm are obtained.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
[2] R.A. George and J. Heringa. The REPRO Server: Finding Protein Internal Sequence Repeats Through the Web. Trends in Biochemical Sciences, 25(10):515-517, October 2000.
|
| |
3
|
[3] A. Heger and L. Holm. Rapid Automatic Detection and Alignment of Repeats in Protein Sequences. Proteins: Structure, Function, and Genetics , 41:224-237, 2000.
|
| |
4
|
[4] J. Heringa. Detection of Internal Repeats: How Common are They? Current Opinion in Structural Biology, 8:338-345, 1998.
|
| |
5
|
[5] J. Heringa and P. Argos. A Method to Recognize Distant Repeats in Protein Sequences. Proteins: Structure, Function, and Genetics, 17:391- 411, 1993.
|
| |
6
|
[6] X. Huang, R.C. Hardison, and W. Miller. A Space-Efficient Algorithm for Local Similarities. Computer Applications in the Biosciences, 6:373-381, 1990.
|
| |
7
|
|
| |
8
|
[8] E.M. Marcotte, M. Pellegrini, T.O. Yeates, and D. Eisenberg. A Census of Protein Repeats. Journal of Molecular Biology, 293:151-160, 1998.
|
| |
9
|
[9] S.B. Needleman and C.D. Wunsch. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology, 48:443-453, 1970.
|
| |
10
|
[10] M. Pellegrini, E.M. Marcotte, and T.O. Yeates. A Fast Algorithm for Genome-Wide Analysis of Proteins with Repeated Sequences. Journal of Molecular Biology, 35:440-446, 1999.
|
| |
11
|
[11] T. Rognes. ParAlign: a Parallel Sequence Alignment Algorithm for Rapid and Sensitive Database Searches. Nucleic Acids Research, 29(7):1647- 1652, 2001.
|
| |
12
|
[12] T. Rognes and E. Seeberg. Six-Fold Speedup of Smith-Waterman Sequence Database Searches Uisng Parallel Processing on Common Microprocessors. Bioinformatics, 16(8):699-706, 2000.
|
| |
13
|
[13] J.W. Romein and H.E. Bal. Solving Awari with Parallel Retrograde Analysis. IEEE Computer, 2003. Accepted for publication.
|
| |
14
|
|
| |
15
|
[15] T.F. Smith and M.S. Waterman. Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147:195-197, 1981.
|
| |
16
|
[16] M.S. Waterman and M. Eggert. A New Algorithm for Best Subsequence Alignments with Application to tRNA-rRNA Comparisons. Journal of Molecular Biology, 197:723-725, 1987.
|
|