| Cache organizations for clustered microarchitectures |
| Full text |
Pdf
(409 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 68
archive
Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
table of contents
Munich, Germany
Pages: 46 - 55
Year of Publication: 2004
ISBN:1-59593-040-X
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 30, Citation Count: 3
|
|
|
ABSTRACT
Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization of the data cache is a key factor in these processors due to its effect on cache miss rate and inter-cluster communications. This paper investigates alternative designs of the data cache: centralized, distributed, replicated and physically distributed cache architectures are analyzed. Results show similar average performance but significant performance variations depending on the application features, specially cache miss ratio and communications. In addition, we also propose a novel instruction steering scheme in order to reduce communications. This scheme conditionally stalls the dispatch of instructions depending on the occupancy of the clusters, whenever the current instruction cannot be steered to the cluster holding most of the inputs. This new steering outperforms traditional schemes. Results show, an average speedup of 5% and up to 15% for some applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360165]
|
| |
3
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
R. Canal, J. M. Parcerisa and A. González. "Dynamic Cluster Assignment Mechanisms". In Proceedings of the International Symposium on High Performance Computing. 2000.
|
| |
7
|
|
 |
8
|
|
| |
9
|
R. Ho, K. W. Mai and M. A. Horowitz. "The Future of Wires". Proceedings of the IEEE, 89(4), pp. 490--504, 2001.
|
| |
10
|
The International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 1999.
|
| |
11
|
|
 |
12
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
P. Shivakumar and N. P. Jouppi. "Cacti 3.0: An Integrated Cache Timing, Power and Area Model". Technical Report, Westenr Research Laboratory, 2001.
|
 |
18
|
Adi Yoaz , Mattan Erez , Ronny Ronen , Stephan Jourdan, Speculation techniques for improving load related instruction scheduling, Proceedings of the 26th annual international symposium on Computer architecture, p.42-53, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
19
|
V. Zyuban. "Inherently Lower-Power High-Performance Supercalar Architectures", University of Notre Dame, 2000
|
CITED BY 3
|
Grigorios Magklis , Pedro Chaparro , José González , Antonio González, Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
|
|
|
|
|