| Multi-execution: multicore caching for data-similar executions |
| Full text |
Pdf
(464 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Cache organization
table of contents
Pages 164-173
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Susmit Biswas
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
Diana Franklin
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
Alan Savage
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
Ryan Dixon
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
Timothy Sherwood
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
Frederic T. Chong
|
University of California, Santa Barbara, Santa Barbara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 72, Downloads (12 Months): 206, Citation Count: 0
|
|
|
ABSTRACT
While microprocessor designers turn to multicore architectures to sustain performance expectations, the dramatic increase in parallelism of such architectures will put substantial demands on off-chip bandwidth and make the memory wall more significant than ever. This paper demonstrates that one profitable application of multicore processors is the execution of many similar instantiations of the same program. We identify that this model of execution is used in several practical scenarios and term it as "multi-execution." Often, each such instance utilizes very similar data. In conventional cache hierarchies, each instance would cache its own data independently. We propose the Mergeable cache architecture that detects data similarities and merges cache blocks, resulting in substantial savings in cache storage requirements. This leads to reductions in off-chip memory accesses and overall power usage, and increases in application performance. We present cycle-accurate simulation results of 8 benchmarks (6 from SPEC2000) to demonstrate that our technique provides a scalable solution and leads to significant speedups due to reductions in main memory accesses. For 8 cores running 8 similar executions of the same application and sharing an exclusive 4-MB, 8-way L2 cache, the Mergeable cache shows a speedup in execution by 2.5x on average (ranging from 0.93x to 6.92x), while posing an overhead of only 4.28% on cache area and 5.21% on power when it is used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Tilera TILE Multicore Processors: http://www.tilera.com/products/processors.php.
|
| |
2
|
Ambric Am2000 Family Massively Parallel Processor Array: http://www.ambric.com/products/index.php.
|
| |
3
|
Nvidia GT200 Series: http://www.nvidia.com/object/geforce gtx 280.html.
|
| |
4
|
PolyScalar: http://users.csc.calpoly.edu/franklin/PolyScalar/Home.htm.
|
| |
5
|
SPEC CPU2000: http://www.spec.org/cpu/.
|
| |
6
|
icsiboost: http://code.google.com/p/icsiboost/.
|
 |
7
|
|
| |
8
|
A. Asuncion and D. Newman. UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, 2007. http://www.ics.uci.edu/mlearn/MLRepository.html.
|
| |
9
|
S. Bederman. Cache Management System Using Virtual and Real Tags in The Cache Directory. IBM Technical Disclosure, 21(11), April 1979.
|
| |
10
|
|
| |
11
|
|
| |
12
|
C.-C. Chang and C.-J. Lin. LIBSVM: a Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
|
 |
13
|
|
| |
14
|
|
| |
15
|
Douglas C. Burger and Todd M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.
|
| |
16
|
K. C. Elliott. Varieties of Exploratory Experimentation in Nanotoxicology. History and Philosophy of the Life Sciences, 29(3), 2007.
|
| |
17
|
|
| |
18
|
T. Kurihara, E. Kamada, K. Shimada, and T. Shimizu. A RISC Processor for SR8000: Accelerating Large Scale Scientific Computing with SMP. In IEEE Symposium on High Performance Chips(HOT CHIPS), 1999.
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
S. Wilton and N. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. IEEE Journal of Solid-State Circuits, 31(5):677--688, May 1996.
|
| |
28
|
K. Solanki, N. Jacobsen, S. Chandrasekaran, U. Madhow, and B. S. Manjunath. High-Volume Data Hiding in Images: Introducing Perceptual Criteria into Quantization Based Embedding. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), volume 4, pages 3485--3488, May 2002.
|
| |
29
|
P. Sollich and A. Krogh. Learning With Ensembles: How Overfitting Can Be Useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 190--196. The MIT Press, 1996.
|
| |
30
|
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance. (TR-1420), October 2000.
|
 |
31
|
|
 |
32
|
|
 |
33
|
|
|