| |
1
|
A. Aiken and A. Nicolau. Optimal loop parallelization. ACM SIGPLAN Notices, 23(7):308--317, 1988.
|
| |
2
|
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, first edition, 2001.
|
| |
3
|
L. Bic, J. M. A. Roy, and M. Nagel. Exploiting iteration-level parallelism in dataflow programs. In Proceedings of the 12th International Conference on Distributed Computing Systems, pages 376--381, June 1992.
|
| |
4
|
W.-L. Chang, C.-P. Chu, and M. Ho. Exploitation of parallelism to nested loops with dependence cycles. Journal of Systems Architecture, 50(12):729--742, 2004.
|
| |
5
|
L.-F. Chao. Scheduling and Behavioral Transformations for Parallel Systems. PhD thesis, Dept. of Computer Science, Princeton University, 1993.
|
| |
6
|
L.-F. Chao, A. S. LaPaugh, and E. H.-M. Sha. Rotation scheduling: a loop pipelining algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 16(3):229--239, March 1997.
|
| |
7
|
L.-F. Chao and E. H.-M. Sha. Static scheduling of uniform nested loops. In Proceedings of the 7th International Parallel Processing Symposium, pages 254--258, April 1993.
|
| |
8
|
D.-K. Chen and P.-C. Yew. Statement re-ordering for doacross loops. In Proceedings of the 1994 International Conference on Parallel Processing (ICPP '94), volume 2, pages 24--28, August 1994.
|
| |
9
|
C.-P. Chu and D. L. Carver. Reordering the statements with dependence cycles to improve the performance of parallel loops. In Proceedings of the 1997 International Conference on Parallel and Distributed Systems (ICPADS '97), pages 322--328, 1997.
|
| |
10
|
R. G. Cytron. Compile-time Scheduling and Optimizations for Multiprocessor System. PhD thesis, University of Illinois at Urbana--Champaign, Champaign, IL, USA, September 1984.
|
| |
11
|
A. Darte and G. Huard. Complexity of multi-dimensional loop alignment. In Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science (STACS '02), pages 179--191, 2002.
|
| |
12
|
A. Fraboulet and A. Mignotte. Source code loop transformations for memory hierarchy optimizations. In Proceedings of the Workshop on Memory Access Decoupled Architecture (MEDEA '01), pages 8--12, Barcelona, Spain, September 2001.
|
| |
13
|
D. A. P. Haiek. Multiprocessors: Discussion of some Theoretical and Practical Problems. PhD thesis, Deptartment of Computer Science, University of Illinois at Urbana-Champaign, November 1979.
|
| |
14
|
D. N. Jayasimha and J. D. Martens. Some architectural and compilation issues in the design of hierarchical shared-memory multiprocessors. In Proceedings of the 6th International Parallel Processing Symposium, pages 567--572, March 1992.
|
| |
15
|
C. E. Leiserson1 and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5--35, 1991.
|
| |
16
|
L.-S. Liu, C.-W. Ho, and J.-P. Sheu. On the parallelism of nested for-loops using index shift method. In Proceedings of the International Conference on Parallel Processing (ICPP '90), pages 119--123, 1990.
|
| |
17
|
M. Liu, Q. Zhuge, Z. Shao, and E. H.-M. Sha. General loop fusion technique for nested loops considering timing and code size. In Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems (CASES '04), pages 190--201, September 2004.
|
| |
18
|
N. Manjikian and T. S. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193--209, February 1997.
|
| |
19
|
K. Okuda. Cycle shrinking by dependence reduction. In Proceedings of the Second International Euro-Par Conference on Parallel Processing (Euro-Par '96), pages 398--401, 1996.
|
| |
20
|
D. Padua, D. Kuck, and D. Lawrie. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computers, C--29(9):763--776, September 1980.
|
| |
21
|
L. Passos and E. H.-M. Sha. Achieving full parallelism using multi-dimensional retiming. IEEE Transactions on Parallel and Distributed Systems, 7(11):1150--1163, November 1996.
|
| |
22
|
N. L. Passos and E. H.-M. Sha. Full parallelism in uniform nested loops using multi-dimensional retiming. In Proceedings of the 1994 International Conference on Parallel Processing (ICPP '94), pages 130--133, August 1994.
|
| |
23
|
C. D. Polychronopoulos. Advanced loop optimizations for parallel computers. In Proceedings of the 1st International Conference on Supercomputing, pages 255--277, March 1988.
|
| |
24
|
C. D. Polychronopoulos. Compiler optimizations for enhancing parallelism and their impact on architecture design. IEEE Transactions on Computers, 37(8):991--1004, August 1988.
|
| |
25
|
Y. Robert and S. W. Song. Revisiting cycle shrinking. Parallel Computing, 18(5):481--496, 1992.
|
| |
26
|
E. H.-M. Sha, C. Lang, and N. L. Passos. Polynomial-time nested loop fusion with full parallelism. In Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing (ICPP '96), volume 3, page 9, August 1996.
|
| |
27
|
W. Shang, M. T. O'Keefe, and J. A. B. Fortes. On loop transformations for generalized cycle shrinking. IEEE Transactions on Parallel and Distributed Systems, 5(2):193--204, February 1994.
|
| |
28
|
L. Solve. http://lpsolve.sourceforge.net/5.5/. 2009.
|
| |
29
|
X. Tang and G. R. Gao. Automatically partitioning threads for multithreaded architectures. Journal of Parallel and Distributed Computing, 58(2):159--189, 1999.
|
| |
30
|
C.-M. Wang and S.-D. Wang. Compiler techniques to extract parallelism within a nested loop. In Proceedings of the Fifteenth Annual International Computer Software and Applications Conference (COMPSAC '91), pages 24--29, Tokyo, Japan, September 1991.
|
| |
31
|
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452--471, 1991.
|
| |
32
|
C. Xue, Z. Shao, M. Liu, and E. H.-M. Sha. Iterational retiming: maximize iteration-level parallelism for nested loops. In Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS '05), pages 309--314, September 2005.
|
| |
33
|
J. Xue, M. Guo, and D. Wei. Improving the parallelism of iterative methods by aggressive loop fusion. Journal of Supercomputing, 43(2):147--164, February 2008.
|
| |
34
|
J. Xue, Q. Huang, and M. Guo. Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences. In Proceedings of the 2005 International Conference on Parallel Processing (ICPP '05), pages 107--115, 2005.
|