ACM Home Page
Please provide us with feedback. Feedback
E = MC3: managing uncertain enterprise data in a cluster-computing environment
Full text PdfPdf (516 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 12: probabilistic databases II table of contents
Pages 441-454  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Fei Xu  University of Florida, Gainesville, FL, USA
Kevin Beyer  IBM Almaden Research Center, San Jos, CA, USA
Vuk Ercegovac  IBM Almaden Research Center, San Jose, CA, USA
Peter J. Haas  IBM Almaden Research Center, San Jose, CA, USA
Eugene J. Shekita  IBM Almaden Research Center, San Jose, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 56,   Downloads (12 Months): 239,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559893
What is a DOI?

ABSTRACT

Modern enterprises must manage uncertain data for purposes of risk assessment and decisionmaking under uncertainty. The Monte Carlo approach embodied in the MCDB system of Jampani et al. is well suited for such a task. MCDB can support industrial strength business-intelligence queries over uncertain warehouse data. Moreover, MCDB's extensible approach to specifying uncertainty can also capture complex stochastic prediction models, allowing sophisticated ``what-if'' analyses within the DBMS. The MCDB computations can be highly CPU intensive, but offer the potential for massive parallelization. To realize this potential, we provide a new system, called MC3 (Monte Carlo Computation on a Cluster), that extends the MCDB approach to the map-reduce processing framework. MC3 can exploit the robustness and scalability of map-reduce, and can handle data stored in non-relational formats. We show how MCDB query plans over ``tuple bundles'' can be translated to sequences of map-reduce operations over nested data, and describe different parallelization schemes. We also provide and analyze several novel distributed algorithms for adding pseudorandom number seeds to tuple bundles. These algorithms ensure statistical correctness of the Monte-Carlo computations while minimizing the seed length. Our experiments show that MC3 can scale well for a variety of workloads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
L. Antova, C. Koch, andD. Olteanu. MayBMS: Managing incomplete information with probabilistic world-set decompositions. In ICDE, pages 1479--1480, 2007.
3
 
4
5
 
6
P. D. Coddington. Random number generators for parallel computers. The NHSE Review, 2, 1996.
 
7
 
8
 
9
L. Devroye. Non-Uniform Random Variate Generation. Springer, 1986.
10
 
11
P. W. Glynn and S. Asmussen. Stochastic Simulation: Algorithms and Analysis. Springer, 2007.
 
12
Hadoop. http://hadoop.apache.org/core/.
 
13
 
14
 
15
S. G. Henderson and B. L. Nelson, editors. Simulation. North-Holland, 2006.
16
 
17
JAQL. http://code.google.com/p/jaql/.
 
18
JSON. http://www.json.org.
19
20
 
21
 
22
 
23
M. Mascagni. Some methods of parallel pseudorandom number generation. In R. Schreiber, M. Heath, and A. Ranade, editors, Algorithms for Parallel Processing, pages 277--288. Springer, 1997.
24
25
 
26
 
27
SimpleDB. http://aws.amazon.com.
28
 
29
SQLServer Data Services. http://www.microsoft.com/sql/dataservices/default.mspx.
 
30
A. Srinivasan, D. M. Ceperley, and M. Mascagni. Random number generators for parallel applications. In Monte Carlo Methods in Chemical Physics, pages 13--36. Wiley, 1997.
 
31
C. J. K. Tan. The PLFG parallel pseudo-random number generator. Future Generation Computer Systems, 18:693--698, 2002.
 
32
D. Z. Wang, E. Michelakis, M. N. Garofalakis, and J. M. Hellerstein. BayesStore:managing large, uncertain data repositories with probabilistic graphical models. Proc. VLDB, pages 340--351, 2008.

Collaborative Colleagues:
Fei Xu: colleagues
Kevin Beyer: colleagues
Vuk Ercegovac: colleagues
Peter J. Haas: colleagues
Eugene J. Shekita: colleagues