INFOMRS Charlotte1 Parallel Computation for SDPs Focusing on the Sparsity of Schur Complements Matrices Makoto Tokyo Tech Katsuki Fujisawa.

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Sparse linear solvers applied to parallel simulations of underground flow in porous and fractured media A. Beaudoin 1, J.R. De Dreuzy 2, J. Erhel 1 and.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 High performance Computing Applied to a Saltwater Intrusion Numerical Model E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA.
24th may Use of genetic algorithm for designing redundant sensor network Carine Gerkens Systèmes chimiques et conception de procédés Département.
Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
Sum of Squares and SemiDefinite Programmming Relaxations of Polynomial Optimization Problems The 2006 IEICE Society Conference Kanazawa, September 21,
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Scalable Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.
Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National.
1cs542g-term Notes  Assignment 1 is out (questions?)
SDPA: Leading-edge Software for SDP Informs ’ 08 Tokyo Institute of Technology Makoto Yamashita Mituhiro Fukuda Masakazu Kojima Kazuhide Nakata.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
6/22/2005ICS'20051 Parallel Sparse LU Factorization on Second-class Message Passing Platforms Kai Shen University of Rochester.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory Joint work with.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Antonio M. Vidal Jesús Peinado
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Direct Self-Consistent Field Computations on GPU Clusters Guochun.
High Performance Solvers for Semidefinite Programs
Consensus-based Distributed Estimation in Camera Networks - A. T. Kamal, J. A. Farrell, A. K. Roy-Chowdhury University of California, Riverside
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
1 High-Performance Implementation of Positive Matrix Completion for SDPs Makoto Yamashita (Tokyo Institute of Technology) Kazuhide Nakata (Tokyo Institute.
Sparsity in Polynomial Optimization IMA Annual Program Year Workshop "Optimization and Control" Minneapolis, January 16-20, 2007 Masakazu Kojima Tokyo.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
A comparison between a direct and a multigrid sparse linear solvers for highly heterogeneous flux computations A. Beaudoin, J.-R. De Dreuzy and J. Erhel.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
1/20 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications Sheng Di, Mohamed Slim Bouguerra, Leonardo Bautista-gomez, Franck Cappello.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.
SuperLU_DIST on GPU Cluster Sherry Li FASTMath Meeting, Oct. 1-2, /2014 “A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
INFSO-RI Enabling Grids for E-sciencE SALUTE – Grid application for problems in quantum transport E. Atanassov, T. Gurov, A. Karaivanova,
1 Efficient Parallel Software for Large-Scale Semidefinite Programs Makoto Tokyo-Tech Katsuki Chuo University MSC Yokohama.
Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
1 Enclosing Ellipsoids of Semi-algebraic Sets and Error Bounds in Polynomial Optimization Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.
Jungpyo Lee Plasma Science & Fusion Center(PSFC), MIT Parallelization for a Block-Tridiagonal System with MPI 2009 Spring Term Project.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
1 Ellipsoid-type Confidential Bounds on Semi-algebraic Sets via SDP Relaxation Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.
1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
A Parallel Linear Solver for Block Circulant Linear Systems with Applications to Acoustics Suzanne Shontz, University of Kansas Ken Czuprynski, University.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Iterative LP and SOCP-based approximations to semidefinite and sum of squares programs Georgina Hall Princeton University Joint work with: Amir Ali Ahmadi.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
1 An approach based on shortest path and connectivity consistency for sensor network localization problems Makoto Yamashita (Tokyo Institute of Technology)
A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue.
Sathya Ronak Alisha Zach Devin Josh
Amir Ali Ahmadi (Princeton University)
Nathan Grabaskas: Batched LA and Parallel Communication Optimization
Optimizing MPI collectives for SMP clusters
Presentation transcript:

INFOMRS Charlotte1 Parallel Computation for SDPs Focusing on the Sparsity of Schur Complements Matrices Makoto Tokyo Tech Katsuki Chuo Univ Mituhiro Tokyo Tech Kazuhide Tokyo Tech Maho RIKEN INFORMS Annual Charlotte 2011/11/15 (2011/11/ /11/16)

INFOMRS Charlotte2 Key phrase  SDPARA: The fastest solver for large SDPs available at SemiDefinite Programming Algorithm paRAllel veresion

INFOMRS Charlotte3 SDPA Online Solver 1.Log-in the online solver 2.Upload your problem 3.Push ’ Execute ’ button 4.Receive the result via Web/Mail ⇒ Online Solver

INFOMRS Charlotte4 Outline 1.SDP applications 2.Standard form and Primal-Dual Interior-Point Methods 3.Inside of SDPARA 4.Numerical Results 5.Conclusion

INFOMRS Charlotte5 SDP Applications 1.Control theory  Against swing, we want to keep stability.  Stability Condition ⇒ Lyapnov Condition ⇒ SDP

INFOMRS Charlotte6  Ground state energy  Locate electrons  Schrodinger Equation ⇒ Reduced Density Matrix ⇒ SDP SDP Applications 2. Quantum Chemistry

INFOMRS Charlotte7 SDP Applications 3. Sensor Network Localization  Distance Information ⇒ Sensor Locations  Protein Structure

INFOMRS Charlotte8 Standard form  The variables are  Inner Product is  The size is roughly determined by Our target

INFOMRS Charlotte9 Primal-Dual Interior-Point Methods Feasible region Optimal Central Path Target

INFOMRS Charlotte10 Schur Complement Matrix where Schur Complement Equation Schur Complement Matrix 1. ELEMENTS (Evaluation of SCM) 2. CHOLESKY (Cholesky factorization of SCM)

INFOMRS Charlotte11 Computation time on single processor  SDPARA replaces these bottleneks by parallel computation ControlPOP ELEMENTS CHOLESKY Total Time unit is second, SDPA 7, Xeon 5460 (3.16GHz)

INFOMRS Charlotte12 Dense & Sparse SCM SDPARA can select Dense or Sparse automatically. Fully dense SCM (100%) Quantum Chemistry Sparse SCM (9.26%) POP

INFOMRS Charlotte13 Different Approaches DenseSparse ELEMENTSRow-wise distribution Formula-cost-based distribution CHOLESKYParallel dense Cholesky (Scalapack) Parallel sparse Cholesky (MUMPS)

INFOMRS Charlotte14 Three formulas for ELEMENTS densesparse All rows are independent.

INFOMRS Charlotte15 Row-wise distribution  Assign servers in a cyclic manner  Simple idea ⇒ Very EFFICINENT  High scalability Server1 Server2 Server3 Server2 Server3 Server4 Server1 Server4

INFOMRS Charlotte16 Numerical Results on Dense SCM  Quantum Chemistry (m=7230, SCM=100%), middle size  SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory ELEMENTS 15x speedup Total 13x speedup Very fast!!

INFOMRS Charlotte17 Drawback of Row-wise to Sparse SCM densesparse  Simple row-wise is ineffective for sparse SCM  We estimate cost of each element

INFOMRS Charlotte18 Formula-cost-based distribution Server1190 Server2185 Server3188 Good load-balance

INFOMRS Charlotte19 Numerical Results on Sparse SCM  Control Theory (m=109,246, SCM=4.39%), middle size  SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory ELEMENTS 13x speedup CHOLESKY 4.7xspeedup Total 5x speedup

INFOMRS Charlotte20 Comparison with PCSDP by SDP with Dense SCM  developed by Ivanov & de Klerk Servers PCSDP SDPARA Time unit is second SDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%) Xeon X5460, 3.16GHz x2, 48GB memory SDPARA is 8x faster by MPI & Multi-Threading

INFOMRS Charlotte21 Comparison with PCSDP by SDP with Sparse SCM  SDPARA handles SCM as sparse  Only SDPARA can solve this size #sensors 1,000 (m=16450; density=1.23%) #Servers PCSDPO.M SDPARA #sensors 35,000 (m=527096; density=6.53 × 10−3%) #Servers PCSDPOut of Memory SDPARA

INFOMRS Charlotte22 Extremely Large-Scale SDPs  16 Servers [Xeon X5670(2.93GHz), 128GB Memory] mSCMtime Esc32_b(QAP)198,432100%129,186 second (1.5days) Other solvers can handle only The LARGEST solved SDP in the world

INFOMRS Charlotte23 Conclusion  Row-wise & Formula-cost-based distribution  parallel Cholesky factorization  SDPARA: The fastest solver for large SDPs  & Online solver Thank you very much for your attention.