ACM Home Page
Please provide us with feedback. Feedback
Lazy, adaptive rid-list intersection, and its application to index anding
Full text PdfPdf (361 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Optimization table of contents
Pages: 773 - 784  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Vijayshankar Raman  IBM Almaden Research Center, San Jose, CA
Lin Qiao  IBM Almaden Research Center, San Jose, CA
Wei Han  IBM Almaden Research Center, San Jose, CA
Inderpal Narang  IBM Almaden Research Center, San Jose, CA
Ying-Lin Chen  IBM Silicon Valley Lab, San Jose, CA
Kou-Horng Yang  IBM Silicon Valley Lab, San Jose, CA
Fen-Ling Ling  IBM Silicon Valley Lab, San Jose, CA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 87,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247566
What is a DOI?

ABSTRACT

RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query process ordoes index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates.

This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RID lists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected.

We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong.

We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000.
 
2
 
3
J. Barbay and C. Kenyon. Adaptive intersection and t-threshold problems. In SODA, 2002.
 
4
J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. In Intl. Workshop on Experimental Algorithms, 2006.
 
5
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal xml pattern matching. In SIGMOD, 2002.
 
6
E. Demaine, A. López-Ortiz, and J. Munro. Adaptive set intersections, unions, and differences. In SODA, 2000.
 
7
Y. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991.
 
8
M. Kamath and K. Ramamritham. Bucket skip merge join: A scalable algorithm for join processing in very large databases using indexes. Technical Report UM-CS-1996-020, University of Massachusetts at Amherst, 1996.
 
9
C. Mohan, Don Haderle, Yun Wang, and Josephine Cheng. Single table access using multiple indexes: Optimization, execution, and concurrency control techniques. In Proceedings of the International Conference on Extending Database Technology, 1990.
 
10
Shivnath Babu, Rajeev Motwani, Kamesh Mungala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined streamfilters. In SIGMOD, 2004.
 
11
P. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, 1997.
 
12
Oracle White Paper. Query Optimization in Oracle Database 10g Release 2, June 2005.
 
13
SAP. SAP Business Information Warehouse Benchmark.
 
14
M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO: DB2's LEarning Optimizer. In VLDB, 2001.
 
15
Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. C-Store: A Column-oriented DBMS. In VLDB, 2005.


Collaborative Colleagues:
Vijayshankar Raman: colleagues
Lin Qiao: colleagues
Wei Han: colleagues
Inderpal Narang: colleagues
Ying-Lin Chen: colleagues
Kou-Horng Yang: colleagues
Fen-Ling Ling: colleagues