ACM Home Page
Please provide us with feedback. Feedback
A cost effective architecture for vectorizable numerical and multimedia applications
Full text PdfPdf (294 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures table of contents
Crete Island, Greece
Pages: 103 - 112  
Year of Publication: 2001
ISBN:1-58113-409-6
Authors
Francisca Quintana  Departamento de Informatica y Sistemas, Universidad de Las Palmas de Gran Canaria, Islas Canarias, Spain
Jesus Corbal  Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
Roger Espasa  Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
Mateo Valero  Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 15,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/378580.378602
What is a DOI?

ABSTRACT

This paper analyzes the performance of vector-dominated regions of code in numerical and multimedia applications in a superscalar+vector architecture and compares it to an 8-way superscalar processor. The ability to split a program's execution into scalar and vector regions allows us to show that (1) as expected, the vector unit is much better than the wide issue superscalar at executing the vector-dominated regions of the code; (2) on the scalar regions, the 8-way superscalar, although better than a 4-way superscalar, is clearly not worth the extra complexity in terms of extra transistors and potential cycle time limitations. Overall, the vector-enhanced superscalar is from 6% to 303% better than an 8-way superscalar. We also present detailed data on the performance of the memory system, which is usually the key limiting factor when running numerical and multimedia applications. We evaluate two additional cache designs that try to alleviate problems created by non-unit stride memory references.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Joel Emer. Simultaneous Multithreading: Multiplying Alpha's Performance. Presentation at the MicroProcessor Forum'99, October,1999.
 
2
K. Diefendorff. Power4 Focuses on Memory Bandwidth. MicroProcessor Report, pages 11-17, October, 1999.
 
3
Harsh Sharangpani. Intel Itanium Processor Microarchitecture Overview. Presentation at the MicroProcessor Forum'99, October,1999.
 
4
5
 
6
Peter Bannon. Alpha 21364:A Scalable Single-chip SMP. Technical Report, http://www.digital.com/alphaoem/microprocessorforum.htm, Compaq Computer Corporation, 1998.
 
7
8
 
9
 
10
11
 
12

Collaborative Colleagues:
Francisca Quintana: colleagues
Jesus Corbal: colleagues
Roger Espasa: colleagues
Mateo Valero: colleagues