| Lazy, adaptive rid-list intersection, and its application to index anding |
| Full text |
Pdf
(361 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
table of contents
Beijing, China
SESSION: Optimization
table of contents
Pages: 773 - 784
Year of Publication: 2007
ISBN:978-1-59593-686-8
|
|
Authors
|
|
Vijayshankar Raman
|
IBM Almaden Research Center, San Jose, CA
|
|
Lin Qiao
|
IBM Almaden Research Center, San Jose, CA
|
|
Wei Han
|
IBM Almaden Research Center, San Jose, CA
|
|
Inderpal Narang
|
IBM Almaden Research Center, San Jose, CA
|
|
Ying-Lin Chen
|
IBM Silicon Valley Lab, San Jose, CA
|
|
Kou-Horng Yang
|
IBM Silicon Valley Lab, San Jose, CA
|
|
Fen-Ling Ling
|
IBM Silicon Valley Lab, San Jose, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 87, Citation Count: 2
|
|
|
ABSTRACT
RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query process ordoes index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates. This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RID lists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected. We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong. We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000.
|
| |
2
|
A. Balmin , T. Eliaz , J. Hornibrook , L. Lim , G. M. Lohman , D. Simmen , M. Wang , C. Zhang, Cost-based optimization in DB2 XML, IBM Systems Journal, v.45 n.2, p.299-319, January 2006
|
| |
3
|
J. Barbay and C. Kenyon. Adaptive intersection and t-threshold problems. In SODA, 2002.
|
| |
4
|
J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. In Intl. Workshop on Experimental Algorithms, 2006.
|
| |
5
|
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal xml pattern matching. In SIGMOD, 2002.
|
| |
6
|
E. Demaine, A. López-Ortiz, and J. Munro. Adaptive set intersections, unions, and differences. In SODA, 2000.
|
| |
7
|
Y. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991.
|
| |
8
|
M. Kamath and K. Ramamritham. Bucket skip merge join: A scalable algorithm for join processing in very large databases using indexes. Technical Report UM-CS-1996-020, University of Massachusetts at Amherst, 1996.
|
| |
9
|
C. Mohan, Don Haderle, Yun Wang, and Josephine Cheng. Single table access using multiple indexes: Optimization, execution, and concurrency control techniques. In Proceedings of the International Conference on Extending Database Technology, 1990.
|
| |
10
|
Shivnath Babu, Rajeev Motwani, Kamesh Mungala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined streamfilters. In SIGMOD, 2004.
|
| |
11
|
P. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, 1997.
|
| |
12
|
Oracle White Paper. Query Optimization in Oracle Database 10g Release 2, June 2005.
|
| |
13
|
SAP. SAP Business Information Warehouse Benchmark.
|
| |
14
|
M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO: DB2's LEarning Optimizer. In VLDB, 2001.
|
| |
15
|
Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. C-Store: A Column-oriented DBMS. In VLDB, 2005.
|
CITED BY 2
|
|
|
|
|
Yin Yang , Dimitris Papadias , Stavros Papadopoulos , Panos Kalnis, Authenticated join processing in outsourced databases, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|