|
ABSTRACT
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (or <i>q</i>-grams) has been explored earlier without sufficient consideration of efficiency. <i>q</i>-grams are used to prune away rows that do not qualify for the query. The problem is to identify a finite number of grams subject to storage constraint that gives maximal pruning for a given query workload. Our contributions include: i) a formal problem definition, that produces results within a provable error bound, ii) performance evaluation of the application of the novel method to real data, and iii) parallelization of the algorithm, scaling considerations and a proposal to handle scaling issues.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Digital Bibliography & Library Project. http://dblp.uni-trier.de/.
|
| |
3
|
|
| |
4
|
E. Ukkonen. Online construction of Suffix-trees. Algorithmica, 1993.
|
| |
5
|
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, L. Pietarinen, and D. Srivastava. Using q grams in a dbms for approximate string processing. IEEE Data Engineering Bulletin, 24(4):28--34, 2001.
|
| |
6
|
Luis Gravano , Panagiotis G. Ipeirotis , H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Approximate String Joins in a Database (Almost) for Free, Proceedings of the 27th International Conference on Very Large Data Bases, p.491-500, September 11-14, 2001
|
 |
7
|
Stefan Burkhardt , Andreas Crauser , Paolo Ferragina , Hans-Peter Lenhof , Eric Rivals , Martin Vingron, q-gram based database searching using a suffix array (QUASAR), Proceedings of the third annual international conference on Computational molecular biology, p.77-83, April 11-14, 1999, Lyon, France
[doi> 10.1145/299432.299460]
|
| |
8
|
Hore, B., Hacigumus, H., Iyer, B., Mehrotra, S. Indexing Text Data under Space Constraints TR-DB-04-02, www-db.ics.uci.edu/pages/publications/index.shtml
|
| |
9
|
|
| |
10
|
D. S. Hochbaum and A. Pathria. Analysis of the Greedy Approach in Problems of Maximum k-Coverage. Naval Research Quarterly, (45):615--627, 1998.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
Bayer, R., and McCreight, C. Organization and maintenance of large ordered indexes Acta Informatica, 1972, pp173--189.
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M., T., and Seiferas, J. The smallest automaton recognizing the subwords of a text. Theoretical computer science, 40(1), 1985, pp. 31--55.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
|