ACM Home Page
Please provide us with feedback. Feedback
Linear approximation of shortest superstrings
Full text PdfPdf (1.23 MB)
Source Journal of the ACM (JACM) archive
Volume 41 ,  Issue 4  (July 1994) table of contents
Pages: 630 - 647  
Year of Publication: 1994
ISSN:0004-5411
Authors
Avrim Blum  Massachusetts Institute of Technology, Cambridge
Tao Jiang  McMaster Univ., Hamilton, Ont., Canada
Ming Li  Univ. of Waterloo, Waterloo, Ont., Canada
John Tromp  CWI, Amsterdam, The Netherlands
Mihalis Yannakakis  AT&T Bell Labs, Murray Hill, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 98,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/179812.179818
What is a DOI?

ABSTRACT

We consider the following problem: given a collection of strings s1,…, sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the superstring problem to be MAXSNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
~ALON, N., COSARES, S., HOCHBAUM, D., AND SHAMIR, R. 1989. An algorithm for the detection ~and construction of Monge Sequences. LiB. Alg. Appl. 114/115, 669-680.
 
2
~ARORA, A., LUND, C., MOTWANI, R., SUDAN, M., AND SZEGEDY, M. 1992. Proof verification and ~hardness of approximation problems. In Proceedings of the 33ra' IEEE Symposium on the ~Foundations of Computer Science. IEEE, New York, pp. 14 23.
 
3
~BARNES, E., AND HOFFMAN, A., 1985. On transportation problems with upper bounds on leading ~rectangles. SIAM J. Alg. Disc. Me&. 6, 487-496.
 
4
~FINE, N., AND WILF, H. 1965. Uniqueness theorems for periodic functions. Proc. Amer. Math. ~Soc. 16, 109-114.
 
5
~GALLANT, J., MAIER, D., AND STORER, J. 1980. On finding minimal length superstrings. J. ~Comput. Syst. Sci. 20, 50-58.
 
6
~GAREY, M., AND JOHNSON, D. 1979. Computers and I~tractabilitv. Freeman, New York.
 
7
~HOFFMAN, A., 1963. On simple transportation problems. In Convexitw Proceedings of Symposta in ~Pure Mathematics, vol. 7. American Mathematical Society, Providence, R.I., pp. 317 327.
 
8
 
9
~M. 1990. Towards a DNA sequencing theory. In Proceedings of &e 31st IEEE Symposimn on ~Foundations of Computer Science. IEEE, New York, pp. 125-134.
 
10
11
 
12
 
13
~PELTOLA, H., SODERLUND, H., TARHIO, J., AND UKKONEN, E. 1983. Algorithms for some string ~matching problems arising in molecular genetics. In Information Processing 83 (Proceedtngs of ~IFIP Congress, 1983). Elsevier Science Publishers R. V. (North-Holland), Amsterdam, The ~Netherlands, pp. 53-64.
 
14
 
15
 
16
17

CITED BY  18


REVIEW

"Ralph Walter Wilkerson : Reviewer"

A greedy algorithm to solve the following problem is studied: for a finite set of strings, find the shortest common superstring S such that every string in the set is a substring of more...

Collaborative Colleagues:
Avrim Blum: colleagues
Tao Jiang: colleagues
Ming Li: colleagues
John Tromp: colleagues
Mihalis Yannakakis: colleagues