High Performance Solvers for Semidefinite Programs This talk is supported by Ewha University High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi @ NMRI Kazuhide Nakata @ Tokyo Tech Maho Nakata @ RIKEN KSIAM Annual Meeting @ Jeju 2011/11/25 (2011/11/25-2011/11/26)
Our interests & SDPA Family How fast can we solve SDPs? How large SDP can we solve? How accurate can we solve SDPs? Parallel SDPA SDPARA SDPA-M SDPARA-C SDPA-C SDPA-GMP Matlab Base solver Multiple precision Strucutural Sparsity SDPA Homepage http://sdpa.sf.net/ KSIAM 2011 @ Jeju
SDPA Online Solver http://sdpa.sf.net/ ⇒ Online Solver Log-in the online solver Upload your problem Push ’Execute’ button Receive the result via Web/Mail KSIAM 2011 @ Jeju
Outline SDP Applications Primal-Dual Interior-Point Methods Inside of SDPARA (Large & Fast) Inside of SDPA-GMP (Accurate) Conclusion
SDP Applications Control Theory Quantum Chemistry Sensor Network Localization Problem Polynomial Optimization KSIAM 2011 @ Jeju
SDP Applications 1.Control theory Against swing, we want to keep stability. Stability Condition ⇒ Lyapnov Condition ⇒ SDP INFOMRS 2011 @ Charlotte 6
SDP Applications 2. Quantum Chemistry Ground state energy Locate electrons Schrodinger Equation ⇒Reduced Density Matrix ⇒SDP INFOMRS 2011 @ Charlotte 7
SDP Applications 3. Sensor Network Localization Distance Information ⇒Sensor Locations Protein Structure INFOMRS 2011 @ Charlotte 8
SDP Applications 4. Polynomial Optimization For example, NP-hard in general Very good lower bound by SDP relaxation method KSIAM 2011 @ Jeju 9
How Large & How Fast & How Accurate SDP Applications Control Theory Quantum Chemistry Polynomial Optimization Sensor Network Localization Problem Many Applications How Large & How Fast & How Accurate KSIAM 2011 @ Jeju 10
Standard form Our target The variables are Inner Product is The size is roughly determined by Ordinal solver Our target KSIAM 2011 @ Jeju
Primal-Dual Interior-Point Methods Central Path Target Optimal Feasible region KSIAM 2011 @ Jeju
Schur Complement Matrix Schur Complement Equation Schur Complement Matrix where 1. ELEMENTS (Evaluation of SCM) 2. CHOLESKY (Cholesky factorization of SCM) KSIAM 2011 @ Jeju
Computation time on single processor Time unit is second, SDPA 7, Xeon 5460 (3.16GHz) Control POP ELEMENTS 22228 668 CHOLESKY 1593 1992 Total 23986 2713 Row-wise distribution Two-dimensional block-cyclic distribution SDPARA replaces these bottleneks by parallel computation KSIAM 2011 @ Jeju
Row-wise distribution Example All rows are independent Assign processors in a cyclic manner Simple idea ⇒Very EFFICIENT High scalability Processor1 Processor2 Processor3 Processor4 KSIAM 2011 @ Jeju
Block Algorithm for Cholesky factorization Triangular Factorization (U: upper triangular matrix) Small Cholesky factorizaton Block Updates Parallel Computing
Two-dimensional block-cyclic distribution Example Scalapack library From the row-wise to TDBCD requires network communication Cholesky on TDBCD is much faster than the on row-wise Processor1 Processor2 Processor3 Processor4 1 2 3 4 KSIAM 2011 @ Jeju
Numerical Results of SDPARA Quantum Chemistry (m=7230, SCM=100%), middle size SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory ELEMENTS 15x speedup CHOLESKY 12x speedup Total 13x speedup Very FAST!! KSIAM 2011 @ Jeju
Acceleration by Multiple Threading Modern Processors have multi-cores Multiple Threading is becoming common Processor1:Thread1 Processor2:Thread1 Processor1:Thread2 Processor2:Thread2 2 Processors x2 Threads on each processor Two-level Parallel Computing KSIAM 2011 @ Jeju
(Two-level parallization) Comparison with PCSDP developed by Ivanov & de Klerk SDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%) Xeon X5460, 3.16GHz x2 (8core), 48GB memory Time unit is second Servers 1 2 4 8 16 PCSDP 53,768 27,854 14,273 7995 4050 SDPARA 5983 3002 1680 901 565 SDPARA is 8x faster by MPI & Multi-Threading (Two-level parallization) KSIAM 2011 @ Jeju
Extremely Large-Scale SDPs Other solvers can handle only m SCM time Esc32_b(QAP) 198,432 100% 129,186 second (1.5days) 16 Servers [Xeon X5670(2.93GHz) , 128GB Memory] The LARGEST solved SDP in the world KSIAM 2011 @ Jeju
Numerical Accuracy One weakpoint of PDIPM . PDIPM requires Eventually, numerical trouble (often, Cholesky fails) for example, KSIAM 2011 @ Jeju
c c Numerical Precision b b a a SDPA-GMP Ordinal double precision in C or C++ arbitrary precision in GMP library b c a 64bit = 1bit(sign) + 11bit(exponent)+53bit(fraction); accuracy = b c a We can arbitrary set the bit number of fraction part. (for example, 200bit = ) Replace BLAS(Basic Linear Algebra Sytems) by MPLAPACK (Multiple precision LAPACK) SDPA-GMP
Numerically Hard problem Test Problem PDIPM is stable if Slater’s condition Graph Partition Problem has no interior Small ⇒ Numerically Hard KSIAM 2011 @ Jeju
Numerical Results of SDPA-GMP Small ⇒ Numerically Hard Solver Accuracy Time(second) 1.0e-1 SDPA 1.08e-8 2.03 SDPA-GMP 4.80e-48 77760.19 1.0e-15 1.63e-7 2.26 2.97e-48 82115.52 5.26e-9 2.36 7.29e-24 105325.74 24digits for even no-interior case SDPA-GMP uses 300 digits KSIAM 2011 @ Jeju 25
Conclusion SDPARA ⇒ How Fast & How Large 100times & SDPA-GMP ⇒ How Accurate http://sdpa.sf.net/ & Online solver Thank you very much for your attention. KSIAM 2011 @ Jeju