|
ABSTRACT
We investigate runtime strategies for data-intensive applications that invovle generalized reductions on large, distributed datasets.Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes.We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations.We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes hst the data.Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step.In other cases, hybrid is usually the best.Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] A. Afework, M. D. Beynon, F. Bustamante, A. Demarzo, R. Ferreira, R. Miller, M. Silberman, J. Saltz, A. Sussman, and H. Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, Nov. 1998.
|
| |
2
|
[2] G. Agrawal, R. Jin, and X. Li. Middleware and compiler support for scalable data mining. In Proceedings of Languages and Compilers for Parallel Computing (LCPC), 2001.
|
 |
3
|
Bill Allcock , Ian Foster , Veronika Nefedova , Ann Chervenak , Ewa Deelman , Carl Kesselman , Jason Lee , Alex Sim , Arie Shoshani , Bob Drach , Dean Williams, High-performance remote access to climate simulation data: a challenge problem for data grid technologies, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), p.46-46, November 10-16, 2001, Denver, Colorado
[doi> 10.1145/582034.582080]
|
| |
4
|
Michael D. Beynon , Tahsin Kurc , Umit Catalyurek , Chialin Chang , Alan Sussman , Joel Saltz, Distributed processing of very large datasets with DataCutter, Parallel Computing, v.27 n.11, p.1457-1478, October 2001
[doi> 10.1016/S0167-8191(01)00099-0]
|
 |
5
|
|
| |
6
|
Randy Butler , Von Welch , Douglas Engert , Ian Foster , Steven Tuecke , John Volmer , Carl Kesselman, A National-Scale Authentication Infrastructure, Computer, v.33 n.12, p.60-66, December 2000
[doi> 10.1109/2.889094]
|
| |
7
|
[7] U. Catalyurek, M. D. Beynon, C. Chang, T. Kurc, A. Sussman, and J. Saltz. The virtual microscope. IEEE Transactions on Information Technology in Biomedicine, 2003. To appear.
|
| |
8
|
[8] Common Component Architecture Forum. http://www.cca-forum.org.
|
| |
9
|
[9] C. Chang. Cost models for query processing strategies in the active data repository. Technical Report CS-TR-4060 and UMIACS-TR-99-54, University of Maryland, Department of Computer Science and UMIACS, Sept. 1999.
|
| |
10
|
[10] C. Chang, T. Kurc, A. Sussman, U. Catalyurek, and J. Saltz. A hypergraph-based workload partitioning strategy for parallel data aggregation. In Proceedings of the Eleventh SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Mar. 2001.
|
| |
11
|
|
| |
12
|
Chialin Chang , Bongki Moon , Anurag Acharya , Carter Shock , Alan Sussman , Joel H. Saltz, Titan: A High-Performance Remote Sensing Database, Proceedings of the Thirteenth International Conference on Data Engineering, p.375-384, April 07-11, 1997
|
| |
13
|
[13] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The Data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications: Special Issue on Network-Based Storage Services, 23(3):187-200, July 2000.
|
| |
14
|
|
| |
15
|
|
| |
16
|
[16] R. Ferreira, T. Kurc, M. Beynon, C. Chang, A. Sussman, and J. Saltz. Object-relational queries into multi-dimensional databases with the Active Data Repository. Parallel Processing Letters, 9(2):173-195, 1999.
|
| |
17
|
[17] R. Ferreira, B. Moon, J. Humphries, A. Sussman, J. Saltz, R. Miller, and A. Demarzo. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 449-453. American Medical Informatics Association, Hanley and Belfus, Inc., Oct. 1997.
|
| |
18
|
[18] I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications and High Performance Computing, 11(2):115-128, 1997.
|
| |
19
|
|
| |
20
|
|
| |
21
|
[21] Global Grid Forum. http://www.gridforum.org.
|
| |
22
|
|
| |
23
|
|
| |
24
|
[24] R. Jin and G. Agrawal. A middleware for developing parallel data mining implementations. In Proceedings of the first SIAM conference on Data Mining, Apr. 2001.
|
| |
25
|
|
 |
26
|
Tahsin Kurc , Chialin Chang , Renato Ferreira , Alan Sussman , Joel Saltz, Querying very large multi-dimensional datasets in ADR, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.12-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331544]
|
| |
27
|
[27] T. M. Kurc, A. Sussman, and J. Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Mar. 1999.
|
| |
28
|
[28] F. Lee, T. Kurc, G. Agrawal, U. Catalyurek, R. Ferreira, and J. Saltz. Optimizing reduction computations in a distributed environment. Technical Report OSU-CISRC-4/02-TR20, Department of Computer and Information Sciences, The Ohio State University, April 2003.
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
[32] SRB: The Storage Resource Broker. http://www.npaci.edu/DICE/SRB/index.html.
|
| |
33
|
[33] The TeraGrid: A Primer, September 2002. Available at www.teragrid.org.
|
| |
34
|
Mustafa Uysal , Tahsin M. Kurç , Alan Sussman , Joel H. Saltz, A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines, Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, p.243-258, May 28-30, 1998
|
 |
35
|
|
|