ACM Home Page
Please provide us with feedback. Feedback
Efficiently support MapReduce-like computation models inside parallel DBMS
Full text PdfPdf (658 KB)
Source
ACM International Conference Proceeding Series archive
Proceedings of the 2009 International Database Engineering & Applications Symposium table of contents
Cetraro - Calabria, Italy
SESSION: Full papers table of contents
Pages 43-53  
Year of Publication: 2009
ISBN:978-1-60558-402-7
Authors
Qiming Chen  HP Labs, Palo Alto, California
Andy Therber  HP TSG SW NED, Cupertino, California
Meichun Hsu  HP Labs, Palo Alto, California
Hans Zeller  HP TSG SW NED, Cupertino, California
Bin Zhang  HP Labs, Palo Alto, California
Ren Wu  HP Labs, Palo Alto, California
Sponsors
: BytePress
Concordia University : Concordia University
: ACM
: Universita della Calabria, Rende(CS), Italy
: ICAR-CNR, Rende (CS), Italy
: ACM International Conference Proceeding Series
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 22,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1620432.1620438
What is a DOI?

ABSTRACT

While parallel DBMSs do support large scale parallel query processing on partitioned data, the reach of more general applications relies on User Defined Functions (UDFs). However, the existent UDF technology is insufficient both conceptually and practically. A UDF is not a relation-in, relation-out operator, which restricts its ability to model complex applications defined on a set of tuples rather than on a single one, and to be composed with other relational operators in a query. Further, to interact with the query execution efficiently, a UDF must be coded with complex interactions with DBMS internal data structures and system calls which is often beyond the expertise of an analytics application developer.

To solve these problems, we start with wrapping general applications with Relation Valued Functions (RVFs); then based on the notion of invocation patterns, we provide focused system support for efficiently integrating RVF execution into the query processing pipeline. We further distinguish the system responsibility and the user responsibility in RVF development, by separating an RVF into the RVF-Shell for dealing with system interaction, and the user-function for pure application logic, such that the RVF-Shell can be constructed in terms of high-level APIs. These mechanisms enable us to solve the essential problems in supporting MapReduce and other analytics computation models inside a parallel database engine: modeling complex applications, integrating them into query processing, and shielding analytics developers from DBMS internal details.

Prototyped on a commercial and proprietary parallel database engine, our experience reveals the practical value of the proposed approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Tasso Argyros, "How Aster In-Database MapReduce Takes UDF's to the next Level", http://www.asterdata.com/blog/index.php/2008/08/27/how-asters-in-database-mapreduce-takes-udfs-to-the-next-level/, 2008.
 
2
Peter Boncz, Marcin Zukowski, Niels Nes, MonetDB/X100: Hyper-Pipelining Query Execution", CIDR 2005.
 
3
F. Carino, W. O'Connell, "Plan-per-tuple Optimization Solution -- Parallel Execution of Expensive User Defined Functions in Object-Relational DBMS, VLDB 1998.
 
4
R. Chaiken, B. Jenkins, P-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou, "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets", VLDB 2008.
 
5
Qiming Chen, Meichun Hsu, "Data-Continuous SQL Process Model", Proc. 16th International Conference on Cooperative Information Systems (CoopIS'08), 2008.
 
6
Qiming Chen, Meichun Hsu, "Correlated Query Process and P2P Execution", The 1th Int'l Conf on Data Management in Grid and P2P Systems (Globe-2008), 2008.
 
7
Qiming Chen, Meichun Hsu, "Inter-Enterprise Collaborative Business Process Management", Proc. of 17th Int'l Conf on Data Engineering (ICDE-2001), 2001, Germany.
 
8
Qiming Chen and Y. Kambayashi, "Nested Relation Based Database Knowledge Representation", Proc. of ACM SIGMOD'91 (ACM SIGMOD Rec. Vol. 20, No. 2), 1991.
 
9
B. F. Cooper, et. al, "PNUTS: Yahoo!'s Hosted Data Serving Platform", VLDB 2008.
 
10
U. Dayal, Meichun Hsu, R. Ladin, "A Transaction Model for Long-Running Activities", VLDB 1991.
 
11
J. Dean., "Experiences with MapReduce, an abstraction for large-scale computation", Int Conf on Parallel Architecture and Compilation Techniques. ACM, 2006.
 
12
D. J. DeWitt, E. Paulson, E. Robinson, J. Naughton, J. Royalty, S. Shankar, A. Krioukov, "Clustera: An Integrated Computation And Data Management System", VLDB 2008.
 
13
Goetz. Graefe, "Volcano -- An Extensible and Parallel Query Processing System", IEEE Trans K&D Eng. 6(1), 1994.
 
14
Jim Gray; D. T. Liu; M. A. Nieto-Santisteban; A. S. Szalay; G. Heber; D. DeWitt, "Scientific Data Management in the Coming Decade", SIGMOD Record, Vol 34, No 4, 2005
 
15
Greenplum, "Greenplum MapReduce for the Petabytes Database", http://www.greenplum.com/resources/MapReduce/, 2008
 
16
HP Neoview enterprise data warehousing platform, http://h71028.www7.hp.com/enterprise/cache/414444-0-0-225-121.html
 
17
IBM DB2 Universal Database V8.1, http://www.ibm.com/software/data/db2, 2004.
 
18
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, "Dryad: Distributed data-parallel programs from sequential building blocks", In EuroSys 2007, March 2007.
 
19
M. Jaedicke, B. Mitschang, "User-Defined Table Operators: Enhancing Extensibility of ORDBMS", VLDB 1999.
 
20
Brian Moran, "UDFs Endanger Performance", http://www.sqlmag.com/Article/ArticleID/42139/sql_server_42139.html
 
21
Andrew Novick, "Drilling Down into Performance Problem, in Chapter 11, Transact-SQL User-Defined Functions", pp. 235--244, Wordware Publishing, ISBN 1-55622, 2004.
 
22
W. O'Connell et. al. "A Teradata Content-Based Multimedia Object Manager for Massively Parallel Architectures," ACM-SIGMOD Conf, Canada, 1996.
 
23
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins, "Pig Latin: A Not-So-Foreign Language for Data Processing", SIGMOD 2008
 
24
Oracle, "Oracle Pipelined Table Functions", Oracle 9i doc.
 
25
C. Ordonez, J. Garcia-Garcia, "vector and Matrix Operations Programmed with UDFs in a Relational DBMS", CIKM'06.
 
26
S. Padmanabhan, T. Malkemus, R. Agarwal, A. Jhingran, "Block oriented processing of Relational Database operations in modern Computer Architectures", ICDE, 2001.
 
27
Andrew Therber, "User Defined Function Interfaces for SAS In-Database Processing", HP Confidential Document, 2008.
 
28
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, D. Stott Parker, "Map-reduce-merge: simplified relational data processing on large clusters", ACM SIGMOD 2007.
 
29
J. Zhou, K. A. Ross, "Buffering Database Operations for enhanced Instruction Cache Performance", SIGMOD 2004.