| Aggressive snoop reduction for synchronized producer-consumer communication in energy-efficient embedded multi-processors |
| Full text |
Pdf
(164 KB)
|
Source
|
International Conference on Hardware Software Codesign
archive
Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
table of contents
Salzburg, Austria
SESSION: Embedded systems architecture
table of contents
Pages: 245 - 250
Year of Publication: 2007
ISBN:978-1-59593-824-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 48, Citation Count: 2
|
|
|
ABSTRACT
Snoop-based cache coherence protocols are typically used when multiple processor cores share memory through a common bus. It is well known, however, that these coherence protocols introduce an excessive power overhead.To help alleviate this problem, we propose an application-driven customization technique where application knowledge regarding data sharing in producer-consumer relationships is used in order to aggressively eliminate unnecessary and predictable snoop-induced cache tag lookups even for references to shared data, thus, achieving significant power reduction with minimal hardware cost. Snoop-induced cache tag lookups for accesses to both shared and private data are eliminated when it is ensured that such lookups will not result in extra knowledge regarding the cache state in respect to the other caches and memories.The proposed methodology relies on the combined support from the compiler, the operating system, and the hardware architecture. Our experiments show average power reductions of more than 80% compared to a general-purpose snoop protocol.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Mirko Loghi , Martin Letis , Luca Benini , Massimo Poncino, Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors, Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 17-19, 2005, Chicago, Illinois, USA
[doi> 10.1145/1057661.1057728]
|
| |
4
|
|
 |
5
|
|
 |
6
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, Proceedings of the 32nd annual international symposium on Computer Architecture, p.222-233, June 04-08, 2005
|
| |
7
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
8
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
9
|
Nathan L. Binkert , Ronald G. Dreslinski , Lisa R. Hsu , Kevin T. Lim , Ali G. Saidi , Steven K. Reinhardt, The M5 Simulator: Modeling Networked Systems, IEEE Micro, v.26 n.4, p.52-60, July 2006
[doi> 10.1109/MM.2006.82]
|
| |
10
|
D. Tarjan, S. Thoziyoor and N. Jouppi, "CACTI 4.0: An Integrated Cache Timing, Power and Area Model", Technical report, HP Laboratories Palo Alto, June 2006.
|
 |
11
|
|
|