|
ABSTRACT
We consider the following problem: given a collection of strings s1,…, sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent O(n log n) result.
We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the superstring problem to be MAXSNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
~ALON, N., COSARES, S., HOCHBAUM, D., AND SHAMIR, R. 1989. An algorithm for the detection ~and construction of Monge Sequences. LiB. Alg. Appl. 114/115, 669-680.
|
| |
2
|
~ARORA, A., LUND, C., MOTWANI, R., SUDAN, M., AND SZEGEDY, M. 1992. Proof verification and ~hardness of approximation problems. In Proceedings of the 33ra' IEEE Symposium on the ~Foundations of Computer Science. IEEE, New York, pp. 14 23.
|
| |
3
|
~BARNES, E., AND HOFFMAN, A., 1985. On transportation problems with upper bounds on leading ~rectangles. SIAM J. Alg. Disc. Me&. 6, 487-496.
|
| |
4
|
~FINE, N., AND WILF, H. 1965. Uniqueness theorems for periodic functions. Proc. Amer. Math. ~Soc. 16, 109-114.
|
| |
5
|
~GALLANT, J., MAIER, D., AND STORER, J. 1980. On finding minimal length superstrings. J. ~Comput. Syst. Sci. 20, 50-58.
|
| |
6
|
~GAREY, M., AND JOHNSON, D. 1979. Computers and I~tractabilitv. Freeman, New York.
|
| |
7
|
~HOFFMAN, A., 1963. On simple transportation problems. In Convexitw Proceedings of Symposta in ~Pure Mathematics, vol. 7. American Mathematical Society, Providence, R.I., pp. 317 327.
|
| |
8
|
|
| |
9
|
~M. 1990. Towards a DNA sequencing theory. In Proceedings of &e 31st IEEE Symposimn on ~Foundations of Computer Science. IEEE, New York, pp. 125-134.
|
| |
10
|
|
 |
11
|
Christos Papadimitriou , Mihalis Yannakakis, Optimization, approximation, and complexity classes, Proceedings of the twentieth annual ACM symposium on Theory of computing, p.229-234, May 02-04, 1988, Chicago, Illinois, United States
[doi> 10.1145/62212.62233]
|
| |
12
|
|
| |
13
|
~PELTOLA, H., SODERLUND, H., TARHIO, J., AND UKKONEN, E. 1983. Algorithms for some string ~matching problems arising in molecular genetics. In Information Processing 83 (Proceedtngs of ~IFIP Congress, 1983). Elsevier Science Publishers R. V. (North-Holland), Amsterdam, The ~Netherlands, pp. 53-64.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
REVIEW
"Ralph Walter Wilkerson : Reviewer"
A greedy algorithm to solve the following problem is studied: for a
finite set of strings, find the shortest common superstring
S
such that every string in the set is a substring of
more...
|