|
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. Proc. of the 1979 National Computer Conference, pages 969-974, June 1979.
|
 |
2
|
|
| |
3
|
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002, NASA Ames Research Center, August 1991.
|
 |
4
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
 |
5
|
William Y. Chen , Scott A. Mahlke , Pohua P. Chang , Wen-mei W. Hwu, Data access microarchitectures for superscalar processors with compiler-assisted data prefetching, Proceedings of the 24th annual international symposium on Microarchitecture, p.69-73, September 1991, Albuquerque, New Mexico, Puerto Rico
[doi> 10.1145/123465.123478]
|
 |
6
|
Robert P. Colwell , Robert P. Nix , John J. O'Donnell , David B. Papworth , Paul K. Rodman, A VLIW architecture for a trace scheduling compiler, Proceedings of the second international conference on Architectual support for programming languages and operating systems, p.180-192, October 1987, Palo Alto, California, United States
|
 |
7
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
8
|
|
| |
9
|
K. Gallivan, W. Jalby, U. Meier, and A. Sameh. The impact of hierarchical memory systems on linear algebra algorithm design. Technical Report UIUCSRD 625, University of Illinios, 1987.
|
| |
10
|
D. Gannon and W. Jalby. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector mulfiprocessor. In The Characteristics of Parallel Algorithms. MIT Press, 1987.
|
| |
11
|
|
| |
12
|
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
|
 |
13
|
|
| |
14
|
E. H. Gomish. Compile time analysis for data prefetching. Master's thesis, University of Illinois at Urbana-Champaign, December 1989.
|
 |
15
|
Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th annual international symposium on Computer architecture, p.254-263, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
M. D. Smith. Tracing with pixie. Technical Report CSL-TR- 91-497, Stanford University, November 1991.
|
| |
27
|
SPEC. The SPEC Benchmark Report. Waterside Associates, Fremont, CA, January 1990.
|
 |
28
|
|
 |
29
|
|
CITED BY 191
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bryan Black , Brian Mueller , Stephanie Postal , Ryan Rakvic , Noppanunt Utamaphethai , John Paul Shen, Load execution latency reduction, Proceedings of the 12th international conference on Supercomputing, p.29-36, July 1998, Melbourne, Australia
|
|
|
|
|
|
|
|
|
P. R. Panda , F. Catthoor , N. D. Dutt , K. Danckaert , E. Brockmeyer , C. Kulkarni , A. Vandercappelle , P. G. Kjeldsberg, Data and memory optimization techniques for embedded systems, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.6 n.2, p.149-206, April 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack L. Lo , Susan J. Eggers , Henry M. Levy , Sujay S. Parekh , Dean M. Tullsen, Tuning compiler optimizations for simultaneous multithreading, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.114-124, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chau-Wen Tseng , Jennifer M. Anderson , Saman P. Amarasinghe , Monica S. Lam, Unified compilation techniques for shared and distributed address space machines, Proceedings of the 9th international conference on Supercomputing, p.67-76, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sharad Malik , Margaret Martonosi , Yau-Tsun Steven Li, Static timing analysis of embedded software, Proceedings of the 34th annual conference on Design automation, p.147-152, June 09-13, 1997, Anaheim, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Santosh G. Abraham , Rabin A. Sugumar , Daniel Windheiser , B. R. Rau , Rajiv Gupta, Predictability of load/store instruction latencies, Proceedings of the 26th annual international symposium on Microarchitecture, p.139-152, December 01-03, 1993, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
W. Zhang , M. Karakoy , M. Kandemir , G. Chen, A compiler approach for reducing data cache energy, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Juan J. Navarro , Elena García-Diego , José R. Herrero, Data prefetching and multilevel blocking for linear algebra operations, Proceedings of the 10th international conference on Supercomputing, p.109-116, May 25-28, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform, ACM SIGPLAN Notices, v.39 n.11, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yoji Yamada , John Gyllenhall , Grant Haab , Wen-mei Hwu, Data relocation and prefetching for programs with large data sets, Proceedings of the 27th annual international symposium on Microarchitecture, p.118-127, November 30-December 02, 1994, San Jose, California, United States
|
|
|
|
|
|
Mary Hall , Peter Kogge , Jeff Koller , Pedro Diniz , Jacqueline Chame , Jeff Draper , Jeff LaCoss , John Granacki , Jay Brockman , Apoorv Srivastava , William Athas , Vincent Freeh , Jaewook Shin , Joonseok Park, Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
D. A. Koufaty , X. Chen , D. K. Poulsen , J. Torrellas, Data forwarding in scalable shared-memory multiprocessors, Proceedings of the 9th international conference on Supercomputing, p.255-264, July 03-07, 1995, Barcelona, Spain
|
|
|
Teresa L. Johnson , Matthew C. Merten , Wen-Mei W. Hwu, Run-time spatial locality detection and optimization, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.57-64, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
|
|
|
|
|
|
Marco Galluzzi , Valentín Puente , Adrián Cristal , Ramón Beivide , José-Ángel Gregorio , Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
Sorin Iacobovici , Lawrence Spracklen , Sudarshan Kadambi , Yuan Chou , Santosh G. Abraham, Effective stream-based and execution-based data prefetching, Proceedings of the 18th annual international conference on Supercomputing, June 26-July 01, 2004, Malo, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sally A. McKee , William A. Wulf , James H. Aylor , Maximo H. Salinas , Robert H. Klenke , Sung I. Hong , Dee A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, v.49 n.11, p.1255-1271, November 2000
|
|
|
|
|
|
Chi-Keung Luk , Robert Muth , Harish Patil , Richard Weiss , P. Geoffrey Lowney , Robert Cohn, Profile-guided post-link stride prefetching, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
Feihui Li , Guangyu Chen , Mahmut Kandemir , Mary Jane Irwin, Compiler-directed proactive power management for networks, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Minas Dasygenis , Erik Brockmeyer , Bart Durinck , Francky Catthoor , Dimitrios Soudris , Antonios Thanailakis, A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.14 n.3, p.279-291, March 2006
|
|
|
|
|
|
Chi-Keung Luk , Robert Muth , Harish Patil , Robert Cohn , Geoff Lowney, Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.15, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
Youcef Bouchebaba , Bruno Girodias , Gabriela Nicolescu , El Mostapha Aboulhamid , Bruno Lavigueur , Pierre Paulin, MPSoC memory optimization using program transformation, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.12 n.4, p.43-es, September 2007
|
|
|
|
|
|
|
|
|
Tanausú Ramírez , Alex Pajuelo , Oliverio J. Santana , Mateo Valero, Kilo-instruction processors, runahead and prefetching, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
|
|
|
|
|
|
Zhen Yang , Xudong Shi , Feiqi Su , Jih-Kwon Peir, Overlapping dependent loads with addressless preload, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
Tong Chen , Tao Zhang , Zehra Sura , Mar Gonzales Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on modern and emerging processors, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.77-96, January 2007
|
|
|
|
|
|
Jiwei Lu , Howard Chen , Rao Fu , Wei-Chung Hsu , Bobbie Othmer , Pen-Chung Yew , Dong-Yuan Chen, The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, December 03-05, 2003
|
|
|
Seung Woo Son , Sai Prashanth Muralidhara , Ozcan Ozturk , Mahmut Kandemir , Ibrahim Kolcu , Mustafa Karakoy, Profiler and compiler assisted adaptive I/O prefetching for shared storage caches, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiang Lin , Qingda Lu , Xiaoning Ding , Zhao Zhang , Xiaodong Zhang , P. Sadayappan, Enabling software management for multicore caches with a lightweight hardware support, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, November 14-20, 2009, Portland, Oregon
|
|