ACM Home Page
Please provide us with feedback. Feedback
Multi-execution: multicore caching for data-similar executions
Full text PdfPdf (464 KB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 36th annual international symposium on Computer architecture table of contents
Austin, TX, USA
SESSION: Cache organization table of contents
Pages 164-173  
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
Authors
Susmit Biswas  University of California, Santa Barbara, Santa Barbara, CA, USA
Diana Franklin  University of California, Santa Barbara, Santa Barbara, CA, USA
Alan Savage  University of California, Santa Barbara, Santa Barbara, CA, USA
Ryan Dixon  University of California, Santa Barbara, Santa Barbara, CA, USA
Timothy Sherwood  University of California, Santa Barbara, Santa Barbara, CA, USA
Frederic T. Chong  University of California, Santa Barbara, Santa Barbara, CA, USA
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 72,   Downloads (12 Months): 206,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555754.1555777
What is a DOI?

ABSTRACT

While microprocessor designers turn to multicore architectures to sustain performance expectations, the dramatic increase in parallelism of such architectures will put substantial demands on off-chip bandwidth and make the memory wall more significant than ever. This paper demonstrates that one profitable application of multicore processors is the execution of many similar instantiations of the same program. We identify that this model of execution is used in several practical scenarios and term it as "multi-execution." Often, each such instance utilizes very similar data. In conventional cache hierarchies, each instance would cache its own data independently. We propose the Mergeable cache architecture that detects data similarities and merges cache blocks, resulting in substantial savings in cache storage requirements. This leads to reductions in off-chip memory accesses and overall power usage, and increases in application performance. We present cycle-accurate simulation results of 8 benchmarks (6 from SPEC2000) to demonstrate that our technique provides a scalable solution and leads to significant speedups due to reductions in main memory accesses. For 8 cores running 8 similar executions of the same application and sharing an exclusive 4-MB, 8-way L2 cache, the Mergeable cache shows a speedup in execution by 2.5x on average (ranging from 0.93x to 6.92x), while posing an overhead of only 4.28% on cache area and 5.21% on power when it is used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Tilera TILE Multicore Processors: http://www.tilera.com/products/processors.php.
 
2
Ambric Am2000 Family Massively Parallel Processor Array: http://www.ambric.com/products/index.php.
 
3
Nvidia GT200 Series: http://www.nvidia.com/object/geforce gtx 280.html.
 
4
PolyScalar: http://users.csc.calpoly.edu/franklin/PolyScalar/Home.htm.
 
5
SPEC CPU2000: http://www.spec.org/cpu/.
 
6
icsiboost: http://code.google.com/p/icsiboost/.
7
 
8
A. Asuncion and D. Newman. UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, 2007. http://www.ics.uci.edu/mlearn/MLRepository.html.
 
9
S. Bederman. Cache Management System Using Virtual and Real Tags in The Cache Directory. IBM Technical Disclosure, 21(11), April 1979.
 
10
 
11
 
12
C.-C. Chang and C.-J. Lin. LIBSVM: a Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
13
 
14
 
15
Douglas C. Burger and Todd M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.
 
16
K. C. Elliott. Varieties of Exploratory Experimentation in Nanotoxicology. History and Philosophy of the Life Sciences, 29(3), 2007.
 
17
 
18
T. Kurihara, E. Kamada, K. Shimada, and T. Shimizu. A RISC Processor for SR8000: Accelerating Large Scale Scientific Computing with SMP. In IEEE Symposium on High Performance Chips(HOT CHIPS), 1999.
 
19
20
 
21
22
 
23
 
24
 
25
 
26
 
27
S. Wilton and N. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid-State Circuits, 31(5):677--688, May 1996.
 
28
K. Solanki, N. Jacobsen, S. Chandrasekaran, U. Madhow, and B. S. Manjunath. High-Volume Data Hiding in Images: Introducing Perceptual Criteria into Quantization Based Embedding. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), volume 4, pages 3485--3488, May 2002.
 
29
P. Sollich and A. Krogh. Learning With Ensembles: How Overfitting Can Be Useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 190--196. The MIT Press, 1996.
 
30
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance. (TR-1420), October 2000.
31
32
33

Collaborative Colleagues:
Susmit Biswas: colleagues
Diana Franklin: colleagues
Alan Savage: colleagues
Ryan Dixon: colleagues
Timothy Sherwood: colleagues
Frederic T. Chong: colleagues