| Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach |
| Full text |
Pdf
(736 KB)
|
| Source
|
ACM Transactions on Database Systems (TODS)
archive
Volume 31 , Issue 2 (June 2006)
table of contents
Pages: 439 - 484
Year of Publication: 2006
ISSN:0362-5915
|
|
Authors
|
|
Gang Qian
|
University of Central Oklahoma, Edmond, OK
|
|
Qiang Zhu
|
The University of Michigan---Dearborn, Dearborn, MI
|
|
Qiang Xue
|
Michigan State University, East Lansing, MI
|
|
Sakti Pramanik
|
Michigan State University, East Lansing, MI
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 99, Citation Count: 1
|
|
|
ABSTRACT
Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity searches require robust indexing techniques. Unfortunately, existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as the R-tree cannot be directly applied to an NDDS. This is because some essential geometric concepts/properties such as the minimum bounding region and the area of a region in a CDS are no longer valid in an NDDS. Other indexing methods based on metric spaces such as the M-tree and the Slim-trees are too general to effectively utilize the special characteristics of NDDSs, resulting in nonoptimized performance. In this article, we propose a new dynamic data-partitioning-based indexing technique, called the ND-tree, to support efficient similarity searches in an NDDS. The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs. Efficient algorithms for ND-tree construction and techniques to solve relevant issues such as handling dimensions with different alphabets in an NDDS are presented. Our experimental results on synthetic data and real genome sequence data demonstrate that the ND-tree outperforms the linear scan, the M-tree and the Slim-trees for similarity searches in multidimensional NDDSs. A theoretical model is also developed to predict the performance of the ND-tree for random data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Clement, J., Flajolet, P., and Vallee, B. 2001. Dynamic sources in information theory: A general analysis of trie structures. Algorithm 29, 1/2, 307--369.
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
Kent, W. J. 2002. BLAT--The BLAST-like aligment tool. Gen. Res. 12, 656--664.
|
| |
17
|
Knuth, D. E. 1973. The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
22
|
Skopal, T., Pokorny, J., and Snasel, V. 2004. PM-tree: Pivoting metric tree for similarity search in multimedia databases. In Proceedings of Dateso 2004 Annual Internation Workshop on DAtabases, TExts, Specifications and Objects (DATESO'04). 27--37.
|
| |
23
|
|
| |
24
|
Uhlmann, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Proc. Lett. 40, 4, 175--179.
|
| |
25
|
|
| |
26
|
|
| |
27
|
Xue, Q., Qian, G., Cole, J. R., and Pramanik, S. 2004. Investigation on approximate q-gram matching in genome sequence databases. Tech. Report, Michigan State Univ.
|
| |
28
|
Xiangmin Zhou , Guoren Wang , Jeffrey Xu Yu , Ge Yu, M+-tree: a new dynamical multidimensional index for metric spaces, Proceedings of the 14th Australasian database conference, p.161-168, February 01, 2003, Adelaide, Australia
|
CITED BY
|
|
Changqing Chen , Sakti Pramanik , Qiang Zhu , Watve Alok , Gang Qian, The C-ND tree: a multidimensional index for hybrid continuous and non-ordered discrete data spaces, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|