ACM Home Page
Please provide us with feedback. Feedback
Exact match search in sequence data using suffix trees
Full text PdfPdf (445 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Paper session KM-2 (knowledge management): index structures table of contents
Pages: 123 - 130  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Mihail Halachev  Concordia University, Montreal, Quebec, Canada
Nematollaah Shiri  Concordia University, Montreal, Quebec, Canada
Anand Thamildurai  Concordia University, Montreal, Quebec, Canada
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 85,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099579
What is a DOI?

ABSTRACT

We study suitable indexing techniques to support efficient exact match search in large biological sequence databases. We propose a suffix tree (ST) representation, called STA-DF, as an alternative to the array representation of ST (STA) proposed in [7] and utilized in [18]. To study the performance of STA and STA-DF, we develop a memory efficient ST-based Exact Match (STEM) search algorithm. We implemented STEM and both representations of ST and conducted extensive experiments. Our results indicate that the STA and STA-DF representations are very similar in construction time, storage utilization, and search time using STEM. In terms of the access patterns by STEM, our results show that compared to STA, the STA-DF representation exhibits better spatial and sequential locality of reference. This suggests that STA-DF would require less number of disk I/Os, and hence is more amenable to efficient and scalable disk-based computation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S.F. Altschul S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. In J. Mol. Biol., 215:403--10, 1990.
 
2
S. F. Altschul, T. L. Madden, A. A. Schaeer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. In Nucleic Acids Research, 25:3389--3402, 1997.
 
3
 
4
5
 
6
Christian Charras and Thierry Lecroq. ESMAJ: Exact String Matching Algorithms, Animation in Java. http://www-igm.univ-lv.fr/~lecroq/string/index.html. Laboratoire d'Informatique de Rouen, Université de Rouen. Last accessed: April, 2005.
 
7
 
8
R. Giegerich, S. Kurtz, and J. Stoye. Wotdeager Source code, http://bibiserv.techfak.uni-bielefeld.de/wotd/. Last accessed March, 2005.
 
9
 
10
R. Nigel Horspool. Practical Fast Searching in Strings. In Software - Practice and Experience, 10:501--506, 1980.
 
11
 
12
 
13
D.E. Knuth, J.H. Morris (Jr), and V.R. Pratt. Fast pattern matching in strings. In SIAM Journal on Computing 6(1):323--350, 1977.
 
14
 
15
C. Meek, J. M. Patel, and S. Kassety. OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences. In Proc. of VLDB, 2003
16
17
 
18
S. Tata, R.A. Hankins, and J. Patel. Practical Suffix Tree Construction. In Proc. of the 30th VLDB, 2004
 
19
E. Ukkonen. On-line construction of suffix-trees. In Algorithmica 14 (1995), 249--260, 1995.
 
20
P. Weiner. Linear Pattern Matching Algorithms. In Proc. 14th Annual Symposium on Switching and Automata Theory, 1973.
 
21

Collaborative Colleagues:
Mihail Halachev: colleagues
Nematollaah Shiri: colleagues
Anand Thamildurai: colleagues