ACM Home Page
Please provide us with feedback. Feedback
Instruction scheduling for a tiled dataflow architecture
Full text PdfPdf (491 KB)
Source Architectural Support for Programming Languages and Operating Systems archive
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems table of contents
San Jose, California, USA
SESSION: Scheduling and spatial programming table of contents
Pages: 141 - 150  
Year of Publication: 2006
ISBN:1-59593-451-0
Also published in ...
Authors
Martha Mercaldi  University of Washington
Steven Swanson  University of Washington
Andrew Petersen  University of Washington
Andrew Putnam  University of Washington
Andrew Schwerin  University of Washington
Mark Oskin  University of Washington
Susan J. Eggers  University of Washington
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 110,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1168857.1168876
What is a DOI?

ABSTRACT

This paper explores hierarchical instruction scheduling for a tiled processor. Our results show that at the top level of the hierarchy, a simple profile-driven algorithm effectively minimizes operand latency. After this schedule has been partitioned into large sections, the bottom-level algorithm must more carefully analyze program structure when producing the final schedule.Our analysis reveals that at this bottom level, good scheduling depends upon carefully balancing instruction contention for processing elements and operand latency between producer and consumer instructions. We develop a parameterizable instruction scheduler that more effectively optimizes this trade-off. We use this scheduler to determine the contention-latency sweet spot that generates the best instruction schedule for each application. To avoid this application-specific tuning, we also determine the parameters that produce the best performance across all applications. The result is a contention-latency setting that generates instruction schedules for all applications in our workload that come within 17% of the best schedule for each.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
D. Buell et al. Splash 2: FPGAs in a Custom Computing Machine. IEEE Computer Society, 1996.
5
6
7
8
9
 
10
G. Desoli. Instruction assignment for clustered VLIW DSP compilers: A new approach. Technical Report HPL-98-13, Hewlett-Packard Laboratories, January 1998.
 
11
 
12
13
14
15
16
17
18
19
 
20
21
 
22
 
23
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart memories: A modular reconfigurable architecture. In International Symposium on Computer Architecture, 2002.
 
24
 
25
 
26
27
28
29
30
31
 
32
33
34
 
35
SPEC. Spec CPU 2000 benchmark specifications. SPEC2000 Benchmark Release, 2000.
 
36
37
38
 
39
40
41
42
 
43


Collaborative Colleagues:
Martha Mercaldi: colleagues
Steven Swanson: colleagues
Andrew Petersen: colleagues
Andrew Putnam: colleagues
Andrew Schwerin: colleagues
Mark Oskin: colleagues
Susan J. Eggers: colleagues