|
ABSTRACT
We present MOCHA, a new self-extensible database middleware system designed to interconnect distributed data sources. MOCHA is designed to scale to large environments and is based on the idea that some of the user-defined functionality in the system should be deployed by the middleware system itself. This is realized by shipping Java code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data source sites while executing data-inflating operators near the client's site. The Volume Reduction Factor is a new and more explicit metric introduced in this paper to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor alone. MOCHA has been implemented in Java and runs on top of Informix and Oracle. We present the architecture of MOCHA, the ideas behind it, and a performance study using scientific data and queries. The results of this study demonstrate that MOCHA provides a more flexible, scalable and efficient framework for distributed query processing compared to those in existing middleware solutions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
CGMH+94
|
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proc. of IPSJ Conf., Tokyo, Japan, 1994.
|
| |
Inf97
|
Informix Corporation. Virtual Table Interface Programmer's Guide, September 1997.
|
| |
Ora99
|
Oracle Corporation. Oracle Transparent Gateways, 1999. http://www.oracle.com/gateways/html/transparent.html.
|
| |
CS96
|
|
 |
FJK96
|
Michael J. Franklin , Björn Thór Jónsson , Donald Kossmann, Performance tradeoffs for client-server query processing, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.149-160, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
GMSvE98
|
Michael Godfrey , Tobias Mayr , Praveen Seshadri , Thorsten von Eicken, Secure and portable database extensibility, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.390-401, June 01-04, 1998, Seattle, Washington, United States
|
 |
Gra93
|
|
| |
HKWY97
|
|
 |
HS93
|
|
| |
ML86
|
|
| |
MS99
|
T. Mayr and P. Seshadri. Optimization of client-site userdefined functions. In Proc. ACM SIGMOD Conf., Philadelphia, PA, USA, 1999.
|
| |
RMR00a
|
|
| |
RMR00b
|
M. Rodriyguez-Martinez and N. Roussopoulos. MOCHA: A Self-Extensible Database Middleware System For Distributed Data Sources. Technical Report UMIACS-TR 2000- 05, CS-TR 4105, University of Maryland, January 2000.
|
| |
RS97
|
|
 |
SAC+79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
SLR97
|
|
 |
Sto93
|
Michael Stonebraker , Jim Frew , Kenn Gardels , Jeff Meredith, The SEQUOIA 2000 storage benchmark, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.2-11, May 25-28, 1993, Washington, D.C., United States
|
| |
TRV96
|
A. Tomasic, L. Rashid, and P. Valduriez. Scaling Heterogeneous Databases and the Design of DISCO. In Proc. 16th ICDCS Conf., Hong Kong, 1996.
|
CITED BY 17
|
|
|
|
|
|
|
|
Michael Beynon , Chialin Chang , Umit Catalyurek , Tahsin Kurc , Alan Sussman , Henrique Andrade , Renato Ferreira , Joel Saltz, Processing large-scale multi-dimensional data in parallel and distributed environments, Parallel Computing, v.28 n.5, p.827-859, May 2002
|
|
|
|
|
|
|
|
|
|
|
|
V. Fontes , B. Schulze , M. Dutra , F. Porto , A. Barbosa, CoDIMS-G: a data and program integration service for the grid, Proceedings of the 2nd workshop on Middleware for grid computing, p.29-34, October 18-22, 2004, Toronto, Ontario, Canada
|
|
|
Henrique Andrade , Tahsin Kurc , Alan Sussman , Joel Saltz, Active Proxy-G: optimizing the query execution process in the grid, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-15, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. F. Aldana , M. Roldán-Castro , I. Navas , M. M. Roldán-García , M. Hidalgo-Conde , O. Trelles, Bio-Broker: a tool for integration of biological data sources and data analysis tools, Software—Practice & Experience, v.36 n.14, p.1585-1604, November 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|