Download presentation
Presentation is loading. Please wait.
Published byJared Gallagher Modified over 9 years ago
1
1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute Masakazu Kojima @ Tokyo-Tech Kazuhide Nakata @ Tokyo-Tech Maho Nakata @ RIKEN ISMP 2009 @ Chicago [2009/08/26]
2
2 Extremely Large SDPs Arising from various fields Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems Most computation time is related to Schur complement matrix (SCM) [SDPARA]Parallel computation for SCM In particular, sparse SCM
3
3 Outline 1.SemiDefinite Programming and Schur complement matrix 2.Parallel Implementation 3.Parallel for Sparse Schur complement 4.Numerical Results 5.Future works
4
4 Standard form of SDP
5
5 Primal-Dual Interior-Point Methods
6
6 Computation for Search Direction Schur complement matrix ⇒ Cholesky Factorizaiton Exploitation of Sparsity in1.ELEMENTS 2.CHOLESKY
7
7 Bottlenecks on Single Processor Apply Parallel Computation to the Bottlenecks in second Opteron 246 (2.0GHz) LiOHHF m1059215018 ELEMENTS6150( 43%)16719( 35%) CHOLESKY7744( 54%)20995( 44%) TOTAL14250(100%)47483(100%)
8
8 SDPARA SDPA parallel version (generic SDP solver) MPI & ScaLAPACK Row-wise distribution for ELEMENTS parallel Cholesky factorization for CHOLESKY http://sdpa.indsys.chuo-u.ac.jp/sdpa/
9
9 Row-wise distribution for evaluation of the Schur complement matrix 4 CPU is available Each CPU computes only their assigned rows . No communication between CPUs Efficient memory management
10
10 Parallel Cholesky factorization We adopt Scalapack for the Cholesky factorization of the Schur complement matrix We redistribute the matrix from row-wise to two-dimensional block-cyclic distribtuion Redistribution
11
11 Computation time on SDP from Quantum Chemistry [LiOH] AIST super cluster Opteron 246 (2.0GHz) 6GB memory/node
12
12 Sclability on SDP from Quantum Chemistry [NF] Total 29 times ELEMENTS 63 times CHOLESKY 39 times ELEMENTS is very effective
13
13 Sparse Schur complement matrix Schur complement matrix becomes very sparse for some applications. ⇒ Simple Row-wise loses its efficiency from Control Theory (100%) from Sensor Network(2.12%)
14
14 Sparseness of Schur complement matrix Many applications have diagonal block structure
15
15 Exploitation of Sparsity in SDPA We change the formula by row-wise F1F1 F2F2 F3F3
16
16 ELEMENTS for Sparse Schur complement 150403020 13520 7010 505 30 3 Load on each CPU CPU1:190 CPU2:185 CPU3:188
17
17 CHOLESKY for Sparse Schur complement Parallel Sparse Cholesky factorization implemented in MUMPS MUMPS adopts Multiple Frontal method 150403020 13520 7010 505 30 3 Memory storage on each processor should be consecutive. The distribution for ELEMENTS matches this method.
18
18 Computation time for SDPs from Polynomial Optimization Problem tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Parallel Sparse Cholesky achieves mild scalability. ELEMENTS attains 24x speed-up on 32 CPUs.
19
19 ELEMENTS Load-balance on 32 CPUs Only first processor has a little heavier computation.
20
20 Automatic selection of sparse / dense SCM Dense Parallel Cholesky achieves higher scalability than Sparse Parallel Cholesky Dense becomes better for many processors. We estimate both computation time using computation cost and scalability.
21
21 Sparse/Dense CHOLESKY for a small SDP from POP tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Only on 4 CPUs, the auto selection failed. (since scalability on sparse cholesky is unstable on 4 CPUs.)
22
22 Numerical Results Comparison with PCSDP Sensor Network Problem generated by SFSDP Multi Threading Quantum Chemistry
23
23 SDPs from Sensor Network #sensors1,000 (m=16,450: density 1.23%) #CPU124816 SDPARA28.222.116.713.827.3 PCSDPM.O.1527887591368 #sensors35,000 (m=527,096: density ) #CPU124816 SDPARA1080845614540506 PCSDPMemory Over. if #sensors >= 4,000 (time unit : second)
24
24 MPI + Multi Threading for Quantum Chemistry N.4P.DZ.pqgt11t2p(m=7230) second 64x speed-up on [16nodesx8threads]
25
25 Concluding Remarks & Future works 1.New parallel schemes for sparse Schur complement matrix 2.Reasonable Scalability 3.Extremely large-scale SDPs with sparse Schur complement matrix Improvement on Multi-Threading for sparse Schur complement matrix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.