ACM Home Page
Please provide us with feedback. Feedback
Optimizing Reduction Computations In a Distributed Environment
Full text PdfPdf (288 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing table of contents
Page: 9  
Year of Publication: 2003
ISBN:1-58113-695-1
Authors
Tahsin Kurc  Ohio State University, Columbus
Feng Lee  Ohio State University, Columbus
Gagan Agrawal  Ohio State University, Columbus
Umit Catalyurek  Ohio State University, Columbus
Renato Ferreira  Ohio State University, Columbus
Joel Saltz  Ohio State University, Columbus
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 26,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We investigate runtime strategies for data-intensive applications that invovle generalized reductions on large, distributed datasets.Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes.We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations.We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes hst the data.Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step.In other cases, hybrid is usually the best.Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[1] A. Afework, M. D. Beynon, F. Bustamante, A. Demarzo, R. Ferreira, R. Miller, M. Silberman, J. Saltz, A. Sussman, and H. Tsang. Digital dynamic telepathology - the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, Nov. 1998.
 
2
[2] G. Agrawal, R. Jin, and X. Li. Middleware and compiler support for scalable data mining. In Proceedings of Languages and Compilers for Parallel Computing (LCPC), 2001.
3
 
4
5
 
6
 
7
[7] U. Catalyurek, M. D. Beynon, C. Chang, T. Kurc, A. Sussman, and J. Saltz. The virtual microscope. IEEE Transactions on Information Technology in Biomedicine, 2003. To appear.
 
8
[8] Common Component Architecture Forum. http://www.cca-forum.org.
 
9
[9] C. Chang. Cost models for query processing strategies in the active data repository. Technical Report CS-TR-4060 and UMIACS-TR-99-54, University of Maryland, Department of Computer Science and UMIACS, Sept. 1999.
 
10
[10] C. Chang, T. Kurc, A. Sussman, U. Catalyurek, and J. Saltz. A hypergraph-based workload partitioning strategy for parallel data aggregation. In Proceedings of the Eleventh SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Mar. 2001.
 
11
 
12
 
13
[13] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The Data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications: Special Issue on Network-Based Storage Services, 23(3):187-200, July 2000.
 
14
 
15
 
16
[16] R. Ferreira, T. Kurc, M. Beynon, C. Chang, A. Sussman, and J. Saltz. Object-relational queries into multi-dimensional databases with the Active Data Repository. Parallel Processing Letters, 9(2):173-195, 1999.
 
17
[17] R. Ferreira, B. Moon, J. Humphries, A. Sussman, J. Saltz, R. Miller, and A. Demarzo. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 449-453. American Medical Informatics Association, Hanley and Belfus, Inc., Oct. 1997.
 
18
[18] I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications and High Performance Computing, 11(2):115-128, 1997.
 
19
 
20
 
21
[21] Global Grid Forum. http://www.gridforum.org.
 
22
 
23
 
24
[24] R. Jin and G. Agrawal. A middleware for developing parallel data mining implementations. In Proceedings of the first SIAM conference on Data Mining, Apr. 2001.
 
25
26
 
27
[27] T. M. Kurc, A. Sussman, and J. Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Mar. 1999.
 
28
[28] F. Lee, T. Kurc, G. Agrawal, U. Catalyurek, R. Ferreira, and J. Saltz. Optimizing reduction computations in a distributed environment. Technical Report OSU-CISRC-4/02-TR20, Department of Computer and Information Sciences, The Ohio State University, April 2003.
 
29
 
30
31
 
32
[32] SRB: The Storage Resource Broker. http://www.npaci.edu/DICE/SRB/index.html.
 
33
[33] The TeraGrid: A Primer, September 2002. Available at www.teragrid.org.
 
34
35
Collaborative Colleagues:
Tahsin Kurc: colleagues
Feng Lee: colleagues
Gagan Agrawal: colleagues
Umit Catalyurek: colleagues
Renato Ferreira: colleagues
Joel Saltz: colleagues