| Reducing snoop-energy in shared bus-based mpsocs by filtering useless broadcasts |
| Full text |
Pdf
(239 KB)
|
| Source
|
Great Lakes Symposium on VLSI
archive
Proceedings of the 17th ACM Great Lakes symposium on VLSI
table of contents
Stresa-Lago Maggiore, Italy
SESSION: Low power architecture and interconnect
table of contents
Pages: 126 - 131
Year of Publication: 2007
ISBN:978-1-59593-605-9
|
|
Authors
|
|
Chun-Mok Chung
|
Seoul National University, Seoul, South Korea
|
|
Jihong Kim
|
Seoul National University, Seoul, South Korea
|
|
Dohyung Kim
|
University of California: San Diego, San Diego, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 36, Citation Count: 1
|
|
|
ABSTRACT
In shared bus-based multiprocessor system-on-a-chips (MPSoCs), snoop-based schemes are widely used to maintain cache coherency. However, many of broadcasts are useless because remote caches seldom have the matching blocks and their tag lookups do not supply data. From the energy perspective, such tag lookups consume unnecessary energy and make the system energy wasteful. In this paper, we propose a broadcast filtering technique to reduce snoop-energy in both of cache and bus. Broadcast filtering is achieved by help of snooping cache and split-bus. The snooping cache checks if matching blocks exist in remote caches before broad casting a coherency request. If no remote cache has the matching block, it eliminates the broadcast. If broadcasting is necessary, only a part of split-bus is used so that the request is selectively broadcasted only to the remote caches that have matching blocks. Simulation results show that our technique reduces 90%, 50%, and 30% of cache lookups, bus usage, and snoop-energy, respectively, with only 2% of degradation in performance. Our technique reduces more energy than other state-of-the-art techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Courtright, "MIPS32 M4K core for multi-CPU applications," Embedded Processors Forum, April 2002.
|
| |
3
|
Lance Hammond , Benedict A. Hubbert , Michael Siu , Manohar K. Prabhu , Michael Chen , Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, v.20 n.2, p.71-84, March 2000
[doi> 10.1109/40.848474]
|
| |
4
|
|
| |
5
|
M. Ekman, F. Dahlgren, and P. Stenström, "Evaluation of snoop-energy reduction techniques for chip-multiprocessors," Proc. of the First Workshop on Duplicating, Deconstructing, and Debunking, May 2002.
|
 |
6
|
|
| |
7
|
C. Saldanha and M. Lipasti, "Power efficient cache coherence," Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.
|
 |
8
|
|
| |
9
|
C. T. Heish and M. Pedram, "Architectural energy optimization by bus splitting," IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems, April 2002.
|
| |
10
|
D. Kim, S. Ha, and R. Gupta, "CATS:Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators," Proc. of Design Automation and Test in Europe, April 2007.
|
| |
11
|
D. Burget and T. Austin, "The SimpleScalar tool set version 4.0," http://www.simplescalar.com/v4test.html.
|
 |
12
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
13
|
P. Shivakumar and N. P. Jouppi, "CACTI 3.0: an integrated cache timing, power, and area model," WRL Research Report 2001/2, August 2001.
|
| |
14
|
|
|