1 Efficient Parallel Software for Large-Scale Semidefinite Programs Makoto Tokyo-Tech Katsuki Chuo University MSC Yokohama [2010/09/08]
2 Outline 1.SemiDefinite Programming 2.Conversion of stability condition for differential inclusions to an SDP 3.Primal-Dual Interior-Point Methods and its parallel implementation 4.Numerical Results
3 Many Applications of SDP Control Theory Stability Condition for Differential Inclusions Discrete-Time Optimal Control Problem Via SDP relaxation Polynomial Optimization Problem Sensor Network Problem Quadratic Assignment Problem Quantum Chemistry/Information Large SDP ⇒ Parallel Solver
4 Standard form of SDP
5 Stability condition for differential inclusions to standard SDP . Does the solution remain in a bounded region? i.e., Yes, if Boyd et al
6 . To hold this inequality, Bounding the condition number ⇒ SDP. Conversion to SDP
7 SDP from SCDI . Feasible solution ⇒ Boundness of the solution Some translation for standard SDP by e.g. YALMIP [J. L ö fberg].
8 Discrete-Time Optimal Control Problems This Problem [Coleman et al] can be formulated as SDP via SparsePOP [Kim et al].
9 Primal-Dual Interior-Point Methods Both Primal and Dual simultaneously in Polynomial-time Many software are developed SDPA [Yamashita et al] SDPT3 [Toh et al] SeDuMi [Sturm et al] CSDP [Borcher et al]
10 Algorithmic Framework of Primal-Dual Interior-Point Methods Feasible Region of Optimal Solution Initial Point Target Point Central Path Search Direction Step Length to keep interior property The most computational time is consumed by the Search Direction
11 Bottlenecks in PDIPM and SDPARA To obtain the direction, we solve 1.ELEMENTS 2.CHOLESKY In SDPARA, parallel computation is applied to these two bottlenecks ProblemELEMENTSCHOLESKYTotal SCDI DTOC Xeon 5460,3.16GHz
12 Nonzero pattern of Schur complement matrix (B) Fully dense Schur complement matrix Fully dense Schur complement matrix Sparse Schur complement matrix Sparse Schur complement matrix SCDI DTOC
13 Exploitation of Sparsity in SDPA We change the formula by row-wise We keep this scheme on parallel computation F1F1 F2F2 F3F3
14 Row-wise distribution for dense Schur complement matrix 4 CPU is available Each CPU computes only their assigned rows . No communication between CPUs Efficient memory management
15 Fomula-Cost Based distribution for sparse Schur complement Load on each CPU CPU1:195 CPU2:187 CPU3:189 CPU4:192 Average:190.75
16 Parallel Computation for CHOLESKY We employ ScaLAPACK [Blackford et.al] ⇒ Dense MUMPS [Amestoy et.al] ⇒ Sparse Different data storage enhance the parallel Cholesky factorization
17 Problems for Numerical Results 16 nodes Xeon X5460 (3.16GHz) 48GB memory
18 Computation time on SDP [SCDI1] Xeon X5460(3.16GHz) 48GB memory/node Total times ELEMENTS times CHOLESKY times ELEMENTS attains high scalability
19 Computation time on SDP [DTOC1] Xeon X5460(3.16GHz) 48GB memory/node Total 4.85 times ELEMENTS times CHOLESKY 4.34 times Parallel Sparse Cholesky is difficult ELEMENTS is still enhanced
20 Comparison with PCSDP [Ivanov et al] 1.SDPARA is faster than PCSDP 2.The scalability of SDPARA is higher 3.Only SDPARA can solve DTOC Time is second, O.M.:out of memory
21 Concluding Remarks & Future works 1.SDP has many applications including control theory 2.SDPARA solves Larse-scale SDPs effectively by parallel computation 3.Appropriate parallel computations are the key of SDPARA implementation Improvement on Multi-Threading for sparse Schur complement matrix