|
ABSTRACT
In this paper we describe parallelization of interior-point method (IPM) aimed at achieving high scalability on large-scale chip-multiprocessors (CMPs). IPM is an important computational technique used to solve optimization problems in many areas of science, engineering and finance. IPM spends most of its computation time in a few sparse linear algebra kernels. While each of these kernels contains a large amount of parallelism, sparse irregular datasets seen in many optimization problems make parallelism difficult to exploit. As a result, most researchers have shown only a relatively low scalability of 4X-12X on medium to large scale parallel machines. This paper proposes and evaluates several algorithmic and hardware features to improve IPM parallel performance on large-scale CMPs. Through detailed simulations, we demonstrate how exploring multiple levels of parallelism with hardware support for low overhead task queues and parallel reduction enables IPM to achieve up to 48X parallel speedup on a 64-core CMP.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
{Czyzyk96} J. Czyzyk, S. Mehrotra, and S. J. Wright. PCx User Guide. Technical Report OTC 96/01, Optimization Technology Center, Argonne National Lab and Northwestern University, May 1996.
|
| |
4
|
{Eckstein92} J. Eckstein, R. Qi, V. I. Ragulin and S. A. Zenios. Data parallel implementation of dense linear programming algorithms. Technical Report TMC-230, Thinking Machines Corporation, Cambridge, MA, 1992.
|
| |
5
|
María Jesús Garzarán , Milos Prvulovic , Ye Zhangy , Josep Torrellas , Alin Jula , Hao Yu , Lawrence Rauchwerger, Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors, Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, p.243, September 08-12, 2001
|
| |
6
|
{Gay88} D. M. Gay. Electronic Mail Distribution of Linear Programming Test Problems. In Committee on Algorithms Newsletter, No. 13, pages 10--12.
|
| |
7
|
{Gochman06} S. Gochman, A. Mendelson, A. Naveh, E. Rotem. Introduction to Intel® Core#8482; Duo Processor Architecture. Intel Technology Journal, Volume 10, Issue 2, 2006.
|
 |
8
|
|
| |
9
|
{Gondzio04} Gondzio, J. and A. Grothey. Exploiting Structure in Parallel Implementation of Interior Point Methods for Optimization. Technical Report MS-04-004, School of Mathematics, The University of Edinburgh, 2004.
|
| |
10
|
A. Gottlieb , R. Grishman , C. P. Kruskal , K. P. McAuliffe , L. Rudolph , M. Snir, The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer, IEEE Transactions on Computers, v.32 n.2, p.175-189, February 1983
[doi> 10.1109/TC.1983.1676201]
|
| |
11
|
{Gupta00} A. Gupta. WSMP: Watson Sparse Matrix Package (Part-II: Direct Solution of General Sparse Systems). Technical Report RC 21888 (98472), IBM T. J. Watson Research Center, 2000.
|
 |
12
|
|
| |
13
|
{Karypis98} G. Karypis and V. Kumar. METIS - A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Version 4.0. University of Minnesota, 1998.
|
| |
14
|
|
| |
15
|
{Koka04} P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, and C. Dulong. Construction and Performance Characterization of Parallel Interior Point Solver on 4-way Intel Itanium Multiprocessor System. IEEE 7th Annual Workshop on Workload Characterization (WWC-7), 2004.
|
 |
16
|
|
| |
17
|
|
| |
18
|
Charles E. Leiserson , Zahi S. Abuhamdeh , David C. Douglas , Carl R. Feynman , Mahesh N. Ganmukhi , Jeffrey V. Hill , W. Daniel Hillis , Bradley C. Kuszmaul , Margaret A. St. Pierre , David S. Wells , Monica C. Wong-Chan , Shaw-Wen Yang , Robert Zak, The network architecture of the connection machine CM-5, Journal of Parallel and Distributed Computing, v.33 n.2, p.145-158, March 15, 1996
[doi> 10.1006/jpdc.1996.0033]
|
| |
19
|
|
 |
20
|
|
| |
21
|
{Lustig92} J. Lustig and G. Li. An implementation of parallel primal and dual interior-point method for block structured linear programs. Computational Optimization and Applications (COAP), 1 (1992)141--161.
|
| |
22
|
{Lustig96} I. J. Lustig and E. Rothberg. Gigaflops in Linear Programming. Operations Research Letters, 18(4): 157--165, 1996.
|
| |
23
|
{MKL07}Inte® Math Kernel Library Reference Manual, 2007
|
| |
24
|
|
| |
25
|
{Nocedal06} J. Nocedal and S. J. Wright. Numerical Optimization, Springer, 2nd edition, 2006.
|
 |
26
|
|
| |
27
|
{Schenk00} O. Schenk. Scalable Parallel Sparse LU Factorization Methods on Shared memory Multiprocessors. Ph.D. Dissertation, Swiss Federal Institute of Technology, Zurich, Switzerland, 2000.
|
|