|
ABSTRACT
The problem of finding a center string that is "close" to every
given string arises in computational molecular biology and coding
theory. This problem has two versions: the Closest String problem
and the Closest Substring problem. Given a set of strings S
= {s1, s2, ...,
sn}, each of length m, the Closest String
problem is to find the smallest d and a string s of length
m which is within Hamming distance d to each
si ε S. This problem comes from
coding theory when we are looking for a code not too far away from
a given set of codes. Closest Substring problem, with an additional
input integer L, asks for the smallest d and a string
s, of length L, which is within Hamming distance d
away from a substring, of length L, of each si. This problem
is much more elusive than the Closest String problem. The Closest
Substring problem is formulated from applications in finding
conserved regions, identifying genetic drug targets and generating
genetic probes in molecular biology. Whether there are efficient
approximation algorithms for both problems are major open questions
in this area. We present two polynomial-time approximation
algorithms with approximation ratio 1 + ε for any small
ε to settle both questions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Sanjeev Arora , David Karger , Marek Karpinski, Polynomial time approximation schemes for dense instances of NP-hard problems, Proceedings of the twenty-seventh annual ACM symposium on Theory of computing, p.284-293, May 29-June 01, 1995, Las Vegas, Nevada, United States
[doi> 10.1145/225058.225140]
|
| |
2
|
|
| |
3
|
|
| |
4
|
Dopazo, J., Rodr&iaccute;guez, A., S&aaccute;iz, J. C., and Sobrino, F. 1993. Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123--125.
|
| |
5
|
Frances, M., and Litman, A. 1997. On covering problems of codes. Theoret. Comput. Syst. 30, 113--119.
|
| |
6
|
Leszek Gąsieniec , Jesper Jansson , Andrzej Lingas, Efficient approximation algorithms for the Hamming center problem, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.905-906, January 17-19, 1999, Baltimore, Maryland, United States
|
| |
7
|
Gillman, D. 1993. A Chernoff bound for random walks on expanders. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 680--691.
|
| |
8
|
Gusfield, D. 1993. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol. 30, 141--154.
|
| |
9
|
|
| |
10
|
Hertz, G., and Stormo, G. 1995. Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In Proceedings of the 3rd International Conference on Bioinformatics and Genome Research. pp. 201--216.
|
| |
11
|
Lawrence, C., and Reilly, A. 1990. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, pp. 41--51.
|
| |
12
|
Lucas, K., Busch, M., Mössinger, S., and Thompson, J. A. 1991. An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS, 7, 525--529.
|
| |
13
|
J. Kevin Lanctot , Ming Li , Bin Ma , Shaojiu Wang , Louxin Zhang, Distinguishing string selection problems, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.633-642, January 17-19, 1999, Baltimore, Maryland, United States
|
 |
14
|
Ming Li , Bin Ma , Lusheng Wang, Finding similar regions in many strings, Proceedings of the thirty-first annual ACM symposium on Theory of computing, p.473-482, May 01-04, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301250.301376]
|
 |
15
|
Ming Li , Bin Ma , Lusheng Wang, Finding similar regions in many strings, Proceedings of the thirty-first annual ACM symposium on Theory of computing, p.473-482, May 01-04, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301250.301376]
|
| |
16
|
Liang, C., Li, M., and Ma, B. 2001. COPIA---A consensus pattern identification and analysis software system. Manuscript. Software available at: http://dna.cs.ucsb.edu/copia/copia_submit.html.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Pevzner, P. A. 2000. Computational molecular biology---An algorithmic approach. The MIT Press, Cambridge, Mass.
|
| |
20
|
Posfai, J., Bhagwat, A., Posfai, G., and Roberts, R. 1989. Predictive motifs derived from cytosine methyltransferases. Nucl. Acids Res. 17, 2421--2435.
|
| |
21
|
Proutski, V., and Holme, E. C. 1996. Primer master: A new program for the design and analysis of PCR primers. CABIOS 12, 253--255.
|
| |
22
|
|
| |
23
|
Schuler, G. D., Altschul, S. F., and Lipman, D. J. 1991. A workbench for multiple alignment construction and analysis. Proteins: Struct. Funct. Genet. 9, 180--190.
|
| |
24
|
Stormo, G. 1990. Consensus patterns in DNA. In Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, R. F. Doolittle, Ed. Methods in Enzymology, vol. 183, pp. 211--221.
|
| |
25
|
Stormo, G., and Hartzell III, G. W. 1991. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 88, 5699--5703.
|
| |
26
|
Waterman, M. 1986. Multiple sequence alignment by consensus. Nucl. Acids Res. 14, 9095--9102.
|
| |
27
|
Waterman, M., Arratia, R., and Galas, D. 1984. Pattern recognition in several sequences: Consensus and alignment. Bull. Math. Biol. 46, 515--527.
|
| |
28
|
Waterman, M., and Perlwitz, M. 1984. Line geometries for sequence comparisons. Bull. Math. Biol. 46, 567--577.
|
|