ACM Home Page
Please provide us with feedback. Feedback
Improved bounds on the average length of longest common subsequences
Full text PdfPdf (493 KB)
Source
Journal of the ACM (JACM) archive
Volume 56 ,  Issue 3  (May 2009) table of contents
Article No. 17  
Year of Publication: 2009
ISSN:0004-5411
Author
George S. Lueker  University of California, Irvine, California
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 160,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516512.1516519
What is a DOI?

ABSTRACT

It has long been known [Chvátal and Sankoff 1975] that the average length of the longest common subsequence of two random strings of length n over an alphabet of size k is asymptotic to γkn for some constant γk depending on k. The value of these constants remains unknown, and a number of papers have proved upper and lower bounds on them. We discuss techniques, involving numerical calculations with recurrences on many variables, for determining lower and upper bounds on these constants. To our knowledge, the previous best-known lower and upper bounds for γ2 were those of Dančík and Paterson, approximately 0.773911 and 0.837623 [Dančík 1994; Dančík and Paterson 1995]. We improve these to 0.788071 and 0.826280. This upper bound is less than the γ2 given by Steele's old conjecture (see Steele [1997, page 3]) that γ2 = 2/(1 + &sqrt;2)≈ 0.828427. (As Steele points out, experimental evidence had already suggested that this conjectured value was too high.) Finally, we show that the upper bound technique described here could be used to produce, for any k, a sequence of upper bounds converging to γk, though the computation time grows very quickly as better bounds are guaranteed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alexander, K. S. 1994. The rate of convergence of the mean length of the longest common subsequence. Ann. Appl. Prob. 4, 4, 1074--1082.
 
2
Apostolico, A., and Guerra, C. 1987. The longest common subsequence problem revisited. Algorithmica 2, 315--336.
 
3
Baeza-Yates, R. A., Gavaldà, R., Navarro, G., and Scheihing, R. 1999. Bounding the expected length of longest common subsequences and forests. Theory Comput. Syst. 32, 435--452.
 
4
Chvátal, V., and Sankoff, D. 1975. Longest common subsequences of two random sequences. J. Appl. Prob. 12, 306--315.
 
5
Dančík, V. 1994. Expected length of longest common subsequences. Ph.D. dissertation, Department of Computer Science, University of Warwick.
 
6
 
7
8
 
9
Janson, S., Luczak, T., and Rucinski, A. 2000. Random Graphs. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, New York.
 
10
 
11
Kiwi, M., Loebl, M., and Matoušek, J. 2005. Expected length of the longest common subsequence for large alphabets. Adv. Math. 197, 2, 480--498.
 
12
Kiwi, M., and Soto, J. 2008. On a speculated relation between Chvátal-Sankoff constants of several sequences. Combinatorics, Probability and Computing. To appear. Available on-line at http://arxiv.org/abs/0810.1066.
 
13
 
14
 
15
 
16
 
17
Pevzner, P. A. 2000. Computational Molecular Biology: An Algorithmic Approach. The MIT Press, Cambridge, MA.
 
18
Seneta, E. 1981. Non-negative Matrices and Markov Chains, 2nd ed. Springer Series in Statistics. Springer-Verlag, New York.
 
19
Steele, J. M. 1997. Probability Theory and Combinatorial Optimization. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia, PA.