ACM Home Page
Please provide us with feedback. Feedback
Scaling performance of interior-point method on large-scale chip multiprocessor system
Full text PdfPdf (324 KB)
Source
Conference on High Performance Networking and Computing archive
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Reno, Nevada
SESSION: Microarchitecture table of contents
Article No. 22  
Year of Publication: 2007
ISBN:978-1-59593-764-3
Authors
Mikhail Smelyanskiy  Microprocessor Technology Labs, Intel
Victor W Lee  Microprocessor Technology Labs, Intel
Daehyun Kim  Microprocessor Technology Labs, Intel
Anthony D Nguyen  Microprocessor Technology Labs, Intel
Pradeep Dubey  Microprocessor Technology Labs, Intel
Sponsors
IEEE-CS\DATC : IEEE Computer Society
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1362622.1362652
What is a DOI?

ABSTRACT

In this paper we describe parallelization of interior-point method (IPM) aimed at achieving high scalability on large-scale chip-multiprocessors (CMPs). IPM is an important computational technique used to solve optimization problems in many areas of science, engineering and finance. IPM spends most of its computation time in a few sparse linear algebra kernels. While each of these kernels contains a large amount of parallelism, sparse irregular datasets seen in many optimization problems make parallelism difficult to exploit. As a result, most researchers have shown only a relatively low scalability of 4X-12X on medium to large scale parallel machines.

This paper proposes and evaluates several algorithmic and hardware features to improve IPM parallel performance on large-scale CMPs. Through detailed simulations, we demonstrate how exploring multiple levels of parallelism with hardware support for low overhead task queues and parallel reduction enables IPM to achieve up to 48X parallel speedup on a 64-core CMP.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
{Czyzyk96} J. Czyzyk, S. Mehrotra, and S. J. Wright. PCx User Guide. Technical Report OTC 96/01, Optimization Technology Center, Argonne National Lab and Northwestern University, May 1996.
 
4
{Eckstein92} J. Eckstein, R. Qi, V. I. Ragulin and S. A. Zenios. Data parallel implementation of dense linear programming algorithms. Technical Report TMC-230, Thinking Machines Corporation, Cambridge, MA, 1992.
 
5
 
6
{Gay88} D. M. Gay. Electronic Mail Distribution of Linear Programming Test Problems. In Committee on Algorithms Newsletter, No. 13, pages 10--12.
 
7
{Gochman06} S. Gochman, A. Mendelson, A. Naveh, E. Rotem. Introduction to Intel® Core#8482; Duo Processor Architecture. Intel Technology Journal, Volume 10, Issue 2, 2006.
8
 
9
{Gondzio04} Gondzio, J. and A. Grothey. Exploiting Structure in Parallel Implementation of Interior Point Methods for Optimization. Technical Report MS-04-004, School of Mathematics, The University of Edinburgh, 2004.
 
10
 
11
{Gupta00} A. Gupta. WSMP: Watson Sparse Matrix Package (Part-II: Direct Solution of General Sparse Systems). Technical Report RC 21888 (98472), IBM T. J. Watson Research Center, 2000.
12
 
13
{Karypis98} G. Karypis and V. Kumar. METIS - A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Version 4.0. University of Minnesota, 1998.
 
14
 
15
{Koka04} P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, and C. Dulong. Construction and Performance Characterization of Parallel Interior Point Solver on 4-way Intel Itanium Multiprocessor System. IEEE 7th Annual Workshop on Workload Characterization (WWC-7), 2004.
16
 
17
 
18
 
19
20
 
21
{Lustig92} J. Lustig and G. Li. An implementation of parallel primal and dual interior-point method for block structured linear programs. Computational Optimization and Applications (COAP), 1 (1992)141--161.
 
22
{Lustig96} I. J. Lustig and E. Rothberg. Gigaflops in Linear Programming. Operations Research Letters, 18(4): 157--165, 1996.
 
23
{MKL07}Inte® Math Kernel Library Reference Manual, 2007
 
24
 
25
{Nocedal06} J. Nocedal and S. J. Wright. Numerical Optimization, Springer, 2nd edition, 2006.
26
 
27
{Schenk00} O. Schenk. Scalable Parallel Sparse LU Factorization Methods on Shared memory Multiprocessors. Ph.D. Dissertation, Swiss Federal Institute of Technology, Zurich, Switzerland, 2000.
Collaborative Colleagues:
Mikhail Smelyanskiy: colleagues
Victor W Lee: colleagues
Daehyun Kim: colleagues
Anthony D Nguyen: colleagues
Pradeep Dubey: colleagues