| Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor |
| Full text |
Pdf
(219 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 15th international conference on Supercomputing
table of contents
Sorrento, Italy
Pages: 236 - 245
Year of Publication: 2001
ISBN:1-58113-410-X
|
|
Authors
|
|
Claude Limousin
|
Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex
|
|
Julien Sebot
|
Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex
|
|
Alexis Vartanian
|
Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex
|
|
Nathalie Drach-Temam
|
Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 24, Citation Count: 5
|
|
|
ABSTRACT
In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D pipeline. We show that SMT is suitable for 3D geometry and we characterize the execution of the geometry stage in term of memory hierarchy, which is the main bottleneck. The results show that latency is not fully recovered by SMT; the use of L2 data prefetching does not succeed in increasing the performance. We show that this problem comes from a pollution of the instruction window by the threads experiencing L2 cache misses, thus reducing the window available for the other threads. We thus propose dcPRED, a hardware mechanism to predict L2 misses and control this pollution. Coupled with L2 data prefetching, dcPRED achieves gains up to 21% over the baseline SMT.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Michael Abrash. InsideXbox Graphics. http://www.ddj.com/articles/2000/0008/0008a/0008a.htm, 2000.
|
 |
2
|
|
| |
3
|
Apple. AltiVec Home Page. http://developper.apple.com/hardware/altivec, may 1999.
|
| |
4
|
Jean-Luc Bechennec. Architecture Simulation Framework. http://www.lri.fr/~osmose, 1998.
|
| |
5
|
Ravi Bhargava , Lizy K. John , Brian L. Evans , Ramesh Radhakrishnan, Evaluating MMX technology using DSP and multimedia applications, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.37-46, November 1998, Dallas, Texas, United States
|
| |
6
|
|
| |
7
|
|
| |
8
|
Peter N. Glaskowsky. 3DLabs ies with jetstream. Microprocessor Report, 12(15):20-21, November 1998.
|
 |
9
|
|
| |
10
|
|
| |
11
|
S. Hily and A. Seznec. Standard memory hierarchy does not at simultaneous multithreading. In Proceedins of the 4thInternational Symposium on High-Performance Computer Architecture, 1998.
|
 |
12
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Microsoft. Microsoft DirectX 3 SDK : Direct3D Overview, 1996.
|
| |
18
|
|
| |
19
|
Motorola. Motorola's high-performance vector parallel processing expansion to the PowerPC architecture. http://www.motorola.com/SPS/PowerPC/AltiVec/, 1999.
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
M. Pontius and N. Bagherzadeh. Multithreaded extensions enhance multimedia performance. In MTEAC 99, Jan 1999.
|
 |
25
|
Parthasarathy Ranganathan , Sarita Adve , Norman P. Jouppi, Performance of image and video processing with general-purpose processors and media ISA extensions, Proceedings of the 26th annual international symposium on Computer architecture, p.124-135, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
26
|
|
| |
27
|
|
| |
28
|
Mark Segal and Kurt Akelay. The OpenGL Graphics System, 1996.
|
 |
29
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
30
|
|
| |
31
|
Chriss Wynn. Opengl vertex programming on future-generation gpus. Nvidia document, 2000.
|
CITED BY 5
|
|
Francisco J. Cazorla , Peter M.W. Knijnenburg , Rizos Sakellariou , Enrique Fernández , Alex Ramirez , Mateo Valero, Predictable performance in SMT processors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|