ACM Home Page
Please provide us with feedback. Feedback
Performance prediction for random write reductions: a case study in modeling shared memory programs
Full text PdfPdf (187 KB)
Source Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems table of contents
Marina Del Rey, California
SESSION: Computer performance evaluation techniques table of contents
Pages: 117 - 128  
Year of Publication: 2002
ISBN:1-58113-531-9
Also published in ...
Authors
Ruoming Jin  Ohio State University, Columbus, OH
Gagan Agrawal  Ohio State University, Columbus, OH
Sponsor
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 33,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511334.511350
What is a DOI?

ABSTRACT

In this paper, we revisit the problem of performance prediction on shared memory parallel machines, motivated by the need for selecting parallelization strategy for random write reductions. Such reductions frequently arise in data mining algorithms.In our previous work, we have developed a number of techniques for parallelizing this class of reductions. Our previous work has shown that each of the three techniques, full replication, optimized full locking, and cache-sensitive, can outperform others depending upon problem, dataset, and machine parameters. Therefore, an important question is, "Can we predict the performance of these techniques for a given problem, dataset, and machine?".This paper addresses this question by developing an analytical performance model that captures a two-level cache, coherence cache misses, TLB misses, locking overheads, and contention for memory. Analytical model is combined with results from micro-benchmarking to predict performance on real machines. We have validated our model on two different SMP machines. Our results show that our model effectively captures the impact of memory hierarchy (two-level cache and TLB) as well as the factors that limit parallelism (contention for locks, memory contention, and coherence cache misses). The difference between predicted and measured performance is within 20% in almost all cases. Moreover, the model is quite accurate in predicting the relative performance of the three parallelization techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
6
7
 
8
9
 
10
Ruoming Jin and Gagan Agrawal. A middleware for developing parallel data mining implementations. In Proceedings of the first SIAM conference on Data Mining, April 2001.
 
11
Ruoming Jin and Gagan Agrawal. Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. In Proceedings of the second SIAM conference on Data Mining, April 2002.
12
 
13
 
14
Larry W. McVoy and Carl Staelin. lmbench: Portable tools for performance analysis. In USENIX Annual Technical Conference, pages 279-294, 1996.
15
 
16
17
18
19
20
21
 
22

Collaborative Colleagues:
Ruoming Jin: colleagues
Gagan Agrawal: colleagues