1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro Tokyo-Tech Yoshiaki University of Virginia Kazuhiro National Maritime Research Institute Masakazu Tokyo-Tech Kazuhide Tokyo-Tech Maho RIKEN ISMP Chicago [2009/08/26]
2 Extremely Large SDPs Arising from various fields Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems Most computation time is related to Schur complement matrix (SCM) [SDPARA]Parallel computation for SCM In particular, sparse SCM
3 Outline 1.SemiDefinite Programming and Schur complement matrix 2.Parallel Implementation 3.Parallel for Sparse Schur complement 4.Numerical Results 5.Future works
4 Standard form of SDP
5 Primal-Dual Interior-Point Methods
6 Computation for Search Direction Schur complement matrix ⇒ Cholesky Factorizaiton Exploitation of Sparsity in1.ELEMENTS 2.CHOLESKY
7 Bottlenecks on Single Processor Apply Parallel Computation to the Bottlenecks in second Opteron 246 (2.0GHz) LiOHHF m ELEMENTS6150( 43%)16719( 35%) CHOLESKY7744( 54%)20995( 44%) TOTAL14250(100%)47483(100%)
8 SDPARA SDPA parallel version (generic SDP solver) MPI & ScaLAPACK Row-wise distribution for ELEMENTS parallel Cholesky factorization for CHOLESKY
9 Row-wise distribution for evaluation of the Schur complement matrix 4 CPU is available Each CPU computes only their assigned rows . No communication between CPUs Efficient memory management
10 Parallel Cholesky factorization We adopt Scalapack for the Cholesky factorization of the Schur complement matrix We redistribute the matrix from row-wise to two-dimensional block-cyclic distribtuion Redistribution
11 Computation time on SDP from Quantum Chemistry [LiOH] AIST super cluster Opteron 246 (2.0GHz) 6GB memory/node
12 Sclability on SDP from Quantum Chemistry [NF] Total 29 times ELEMENTS 63 times CHOLESKY 39 times ELEMENTS is very effective
13 Sparse Schur complement matrix Schur complement matrix becomes very sparse for some applications. ⇒ Simple Row-wise loses its efficiency from Control Theory (100%) from Sensor Network(2.12%)
14 Sparseness of Schur complement matrix Many applications have diagonal block structure
15 Exploitation of Sparsity in SDPA We change the formula by row-wise F1F1 F2F2 F3F3
16 ELEMENTS for Sparse Schur complement Load on each CPU CPU1:190 CPU2:185 CPU3:188
17 CHOLESKY for Sparse Schur complement Parallel Sparse Cholesky factorization implemented in MUMPS MUMPS adopts Multiple Frontal method Memory storage on each processor should be consecutive. The distribution for ELEMENTS matches this method.
18 Computation time for SDPs from Polynomial Optimization Problem tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Parallel Sparse Cholesky achieves mild scalability. ELEMENTS attains 24x speed-up on 32 CPUs.
19 ELEMENTS Load-balance on 32 CPUs Only first processor has a little heavier computation.
20 Automatic selection of sparse / dense SCM Dense Parallel Cholesky achieves higher scalability than Sparse Parallel Cholesky Dense becomes better for many processors. We estimate both computation time using computation cost and scalability.
21 Sparse/Dense CHOLESKY for a small SDP from POP tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Only on 4 CPUs, the auto selection failed. (since scalability on sparse cholesky is unstable on 4 CPUs.)
22 Numerical Results Comparison with PCSDP Sensor Network Problem generated by SFSDP Multi Threading Quantum Chemistry
23 SDPs from Sensor Network #sensors1,000 (m=16,450: density 1.23%) #CPU SDPARA PCSDPM.O #sensors35,000 (m=527,096: density ) #CPU SDPARA PCSDPMemory Over. if #sensors >= 4,000 (time unit : second)
24 MPI + Multi Threading for Quantum Chemistry N.4P.DZ.pqgt11t2p(m=7230) second 64x speed-up on [16nodesx8threads]
25 Concluding Remarks & Future works 1.New parallel schemes for sparse Schur complement matrix 2.Reasonable Scalability 3.Extremely large-scale SDPs with sparse Schur complement matrix Improvement on Multi-Threading for sparse Schur complement matrix