|
ABSTRACT
We study algorithms for computing the equijoin of two relations in a system with a standard architecture hut with large amounts of main memory. Our algorithms are especially efficient when the main memory available is a significant fraction of the size of one of the relations to he joined; but they can be applied whenever there is memory equal to approximately the square root of the size of one relation. We present a new algorithm which is a hybrid of two hash-based algorithms and which dominates the other algorithms we present, including sort-merge. Even in a virtual memory environment, the hybrid algorithm dominates all the others we study.
Finally, we describe how three popular tools to increase the efficiency of joins, namely filters, Babb arrays, and semijoins, can he grafted onto any of our algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Philip A. Bernstein , Nathan Goodman , Eugene Wong , Christopher L. Reeve , James B. Rothnie, Jr., Query processing in a system for distributed databases (SDD-1), ACM Transactions on Database Systems (TODS), v.6 n.4, p.602-625, Dec. 1981
[doi> 10.1145/319628.319650]
|
 |
3
|
|
| |
4
|
BLASGEN, M. W., AND ESWARAN, K. P. Storage and access in relational databases. IBM Syst. J. 16, 4 (1977).
|
| |
5
|
|
 |
6
|
David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts
|
| |
7
|
DEWITT, D., AND GERBER, R. Multiprocessor hash-based join algorithms. In Proceedings of the Conference on Very Large Data Bases (Stockholm, 1985).
|
| |
8
|
DIGITAL EQUIPMENT CORP. Product announcement, 1984.
|
 |
9
|
|
| |
10
|
GARCIA-MOLINA, H., LIPTON, R., AND VALDES, J. A massive memory machine. IEEE Trans. Comput. C-33, 5 (1984), 391-399.
|
| |
11
|
GOODMAN, J.R. An investigation of multiprocessor structures and algorithms for data base management. Electronics Research Lab. Memo. ECB/ERL M81/33, Univ. of California, Berkeley, 1981.
|
 |
12
|
|
| |
13
|
KIESSLINO, W. Tuneable dynamic filter algorithms for high performance database systems. In Proceedings of the International Workshop on High Level Computer Architecture (May 1984), 6.10-6.20.
|
| |
14
|
KITSUREOAWA, M., ET AL. Application of hash to data base machine and its architecture. New Generation Comput. 1 (1983), 62-74.
|
| |
15
|
|
 |
16
|
|
| |
17
|
SACCO, G. M., AND SCHOLNICK, M. A mechanism for managing the buffer pool in a relational database system using the hot-set model. Computer Science Res. Rep. RJ-3354, IBM Research Lab., San Jose, Calif., Jan. 1982.
|
 |
18
|
|
| |
19
|
SLOTNICK, D. Logic per track devices. In Advances in Computers, Vol. 10, J. Tou, Ed., Academic Press, New York, 1970, 291-296.
|
 |
20
|
|
 |
21
|
|
| |
22
|
YAMANE, Y. A hash join technique for relational database systems. In Proceedings of the International Conference on Foundations of Data Organization (Kyoto, May 1985}.
|
CITED BY 116
|
|
|
|
|
Kyu-Young Whang , Art Ammann , Anthony Bolmarcich , Maria Hanrahan , Guy Hochgesang , Kuan-Tsae Huang , Al Khorasani , Ravi Krishnamurthy , Gary Sockut , Paula Sweeney , Vance Waddle , Moshé Zloof, Office-by-example: an integrated office system and database manager, ACM Transactions on Information Systems (TOIS), v.5 n.4, p.393-427, Oct. 1987
|
|
|
|
|
|
|
|
|
|
|
|
Masaru Kitsuregawa , Miyuki Nakano , Lilian Harada , Mikio Takagi, Performance evaluation of functional disk system with nonuniform data distribution, Proceedings of the second international symposium on Databases in parallel and distributed systems, p.80-89, July 02-04, 1990, Dublin, Ireland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francis Chu , Joseph Y. Halpern , Praveen Seshadri, Least expected cost query optimization: an exercise in utility, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.138-147, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , David Patterson , Kathy Yelick, Cluster I/O with River: making the fast case common, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.10-22, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
L. M. Haas , P. M. Schwarz , P. Kodali , E. Kotlar , J. E. Rice , W. C. Swope, DiscoveryLink: a system for integrated access to life sciences data sources, IBM Systems Journal, v.40 n.2, p.489-511, February 2001
|
|
|
Christopher Jermaine , Alin Dobra , Subramanian Arumugam , Shantanu Joshi , Abhijit Pol, A disk-based join with probabilistic guarantees, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jayavel Shanmugasundaram , Eugene Shekita , Rimon Barr , Michael Carey , Bruce Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently publishing relational data as XML documents, The VLDB Journal — The International Journal on Very Large Data Bases, v.10 n.2-3, p.133-154, September 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jayavel Shanmugasundaram , Eugene J. Shekita , Rimon Barr , Michael J. Carey , Bruce G. Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.65-76, September 10-14, 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mahmoud El Samad , Julien Gossa , Franck Morvan , Abdelkader Hameurlain , Jean-Marc Pierson , Lionel Brunie, A monitoring service for large-scale dynamic query optimisation in a grid environment, International Journal of Web and Grid Services, v.4 n.2, p.222-246, June 2008
|
|
|
|
|
|
Mark Lillibridge , Kave Eshghi , Deepavali Bhagwat , Vinay Deolalikar , Greg Trezise , Peter Camble, Sparse indexing: large scale, inline deduplication using sampling and locality, Proccedings of the 7th conference on File and stroage technologies, p.111-123, February 24-27, 2009, San Francisco, California
|
|
|
|
|
|
Dimitris Tsirogiannis , Stavros Harizopoulos , Mehul A. Shah , Janet L. Wiener , Goetz Graefe, Query processing techniques for solid state drives, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
REVIEW
"Vasant B. Kaujalgi : Reviewer"
Database Management Systems (DBMSs) are playing an increasing role in the
development of integrated packages. The design strategies of DBMSs have
changed over the years. Relational DBMSs, even though simple in structure,
are considerably complex
more...
|