|
ABSTRACT
A digital search tree (DST) -- one of the most fundamental data structures on words -- is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular Lempel-Ziv'78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it is a function of the number of stored strings and the distance from the root. Most parameters of DST (e.g., height, fill-up) can be expressed in terms of the profile. However, from the inception of DST, the analysis of the profile has been elusive and it has become a prominent open problem in the area of analysis of algorithms. We make here the first, but decisive, step towards solving this problem. We present a precise analysis of the average profile when stored strings are generated by a biased memoryless source. The main technical difficulty of analyzing the profile lies in solving a sophisticated recurrence equation. We present such a solution for the Poissonized version of the problem (i.e., when the number of stored strings is generated by a Poisson distribution) in the Mellin transform domain. To accomplish it, we introduce a novel functional operator that allows us to express the solution in an explicit form, and then using analytic algorithmics tools to extract the asymptotic behavior of the profile. This analysis is surprisingly demanding but once it is carried out it reveals unusually intriguing and interesting behavior. The average profile undergoes several phase transitions when moving from the root to the longest path. At first, it resembles a full tree until it abruptly starts growing polynomially and it oscillates in this range. Our results are derived by methods of analytic algorithmics such as generating functions, Mellin transform, Poissonization and de-Poissonization, the saddle-point method, singularity analysis and uniform asymptotic analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Devroye, A Study of Trie-Like Structures Under the Density Model, Annals of Applied Probability, 2, 402--434, 1992.
|
| |
2
|
L. Devroye and H.-K. Hwang, Width and mode of the profile for some random trees of logarithmic height, Ann. Appl. Probab., 16, 886--918, 2006.
|
| |
3
|
|
| |
4
|
|
| |
5
|
C. Knessl, and W. Szpankowski, On the Average Profile of Symmetric Digital Search Trees, preprint, 2008.
|
| |
6
|
|
| |
7
|
G. Louchard, Exact and Asymptotic Distributions in Digital and Binary Search Trees, RAIRO Theoretical Inform. Applications, 21, 479--495, 1987.
|
| |
8
|
G. Louchard and W. Szpankowski, Average Profile and Limiting Distribution for a Phrase Size in the Lempel-Ziv Parsing Algorithm, IEEE Trans. Information Theory, 41, 478--488, 1995.
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
G. Park, H. K. Hwang, P. Nicodeme, and W. Szpankowski, Profile of Tries, SIAM J. Computing, to appear; also Proc. LATIN'08, LNCS 4957, 1--11, 2008.
|
| |
13
|
B. Pittel, Asymptotic Growth of a Class of Random Trees, Annals of Probability, 18, 414--427, 1985.
|
| |
14
|
Helmut Prodinger, Digital Search Trees and Basic Hypergeometric Functions, Bulletin of the EATCS, 56, 1995.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
|