ACM Home Page
Please provide us with feedback. Feedback
Indexing text data under space constraints
Full text PdfPdf (674 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management table of contents
Washington, D.C., USA
SESSION: DB-IR-1 (databases and information retrieval): indexing and query processing effiency table of contents
Pages: 198 - 207  
Year of Publication: 2004
ISBN:1-58113-874-1
Authors
Bijit Hore  University of California - Irvine, CA
Hakan Hacigumus  IBM Almaden Research Center
Bala Iyer  Silicon Valley Lab
Sharad Mehrotra  University of California - Irvine, CA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 44,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031171.1031212
What is a DOI?

ABSTRACT

An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (or <i>q</i>-grams) has been explored earlier without sufficient consideration of efficiency. <i>q</i>-grams are used to prune away rows that do not qualify for the query. The problem is to identify a finite number of grams subject to storage constraint that gives maximal pruning for a given query workload. Our contributions include: i) a formal problem definition, that produces results within a provable error bound, ii) performance evaluation of the application of the novel method to real data, and iii) parallelization of the algorithm, scaling considerations and a proposal to handle scaling issues.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Digital Bibliography & Library Project. http://dblp.uni-trier.de/.
 
3
 
4
E. Ukkonen. Online construction of Suffix-trees. Algorithmica, 1993.
 
5
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, L. Pietarinen, and D. Srivastava. Using q grams in a dbms for approximate string processing. IEEE Data Engineering Bulletin, 24(4):28--34, 2001.
 
6
7
 
8
Hore, B., Hacigumus, H., Iyer, B., Mehrotra, S. Indexing Text Data under Space Constraints TR-DB-04-02, www-db.ics.uci.edu/pages/publications/index.shtml
 
9
 
10
D. S. Hochbaum and A. Pathria. Analysis of the Greedy Approach in Problems of Maximum k-Coverage. Naval Research Quarterly, (45):615--627, 1998.
11
12
 
13
14
 
15
Bayer, R., and McCreight, C. Organization and maintenance of large ordered indexes Acta Informatica, 1972, pp173--189.
16
 
17
18
 
19
 
20
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M., T., and Seiferas, J. The smallest automaton recognizing the subwords of a text. Theoretical computer science, 40(1), 1985, pp. 31--55.
21
22
 
23
 
24
 
25
26
 
27
28


Collaborative Colleagues:
Bijit Hore: colleagues
Hakan Hacigumus: colleagues
Bala Iyer: colleagues
Sharad Mehrotra: colleagues