| Exact match search in sequence data using suffix trees |
| Full text |
Pdf
(445 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 14th ACM international conference on Information and knowledge management
table of contents
Bremen, Germany
SESSION: Paper session KM-2 (knowledge management): index structures
table of contents
Pages: 123 - 130
Year of Publication: 2005
ISBN:1-59593-140-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 85, Citation Count: 0
|
|
|
ABSTRACT
We study suitable indexing techniques to support efficient exact match search in large biological sequence databases. We propose a suffix tree (ST) representation, called STA-DF, as an alternative to the array representation of ST (STA) proposed in [7] and utilized in [18]. To study the performance of STA and STA-DF, we develop a memory efficient ST-based Exact Match (STEM) search algorithm. We implemented STEM and both representations of ST and conducted extensive experiments. Our results indicate that the STA and STA-DF representations are very similar in construction time, storage utilization, and search time using STEM. In terms of the access patterns by STEM, our results show that compared to STA, the STA-DF representation exhibits better spatial and sequential locality of reference. This suggests that STA-DF would require less number of disk I/Os, and hence is more amenable to efficient and scalable disk-based computation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S.F. Altschul S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. In J. Mol. Biol., 215:403--10, 1990.
|
| |
2
|
S. F. Altschul, T. L. Madden, A. A. Schaeer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. In Nucleic Acids Research, 25:3389--3402, 1997.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Christian Charras and Thierry Lecroq. ESMAJ: Exact String Matching Algorithms, Animation in Java. http://www-igm.univ-lv.fr/~lecroq/string/index.html. Laboratoire d'Informatique de Rouen, Université de Rouen. Last accessed: April, 2005.
|
| |
7
|
|
| |
8
|
R. Giegerich, S. Kurtz, and J. Stoye. Wotdeager Source code, http://bibiserv.techfak.uni-bielefeld.de/wotd/. Last accessed March, 2005.
|
| |
9
|
|
| |
10
|
R. Nigel Horspool. Practical Fast Searching in Strings. In Software - Practice and Experience, 10:501--506, 1980.
|
| |
11
|
|
| |
12
|
|
| |
13
|
D.E. Knuth, J.H. Morris (Jr), and V.R. Pratt. Fast pattern matching in strings. In SIAM Journal on Computing 6(1):323--350, 1977.
|
| |
14
|
|
| |
15
|
C. Meek, J. M. Patel, and S. Kassety. OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences. In Proc. of VLDB, 2003
|
 |
16
|
|
 |
17
|
|
| |
18
|
S. Tata, R.A. Hankins, and J. Patel. Practical Suffix Tree Construction. In Proc. of the 30th VLDB, 2004
|
| |
19
|
E. Ukkonen. On-line construction of suffix-trees. In Algorithmica 14 (1995), 249--260, 1995.
|
| |
20
|
P. Weiner. Linear Pattern Matching Algorithms. In Proc. 14th Annual Symposium on Switching and Automata Theory, 1973.
|
| |
21
|
|
|