A computational loop k k Integration Newton Iteration

Slides:



Advertisements
Similar presentations
Communication costs of LU decomposition algorithms for banded matrices Razvan Carbunescu 12/02/20111.
Advertisements

Weighted Matrix Reordering and Parallel Banded Preconditioners for Nonsymmetric Linear Systems Murat Manguoğlu*, Mehmet Koyutürk**, Ananth Grama* and Ahmed.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Scalable Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory Informs Computing.
Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
1cs542g-term Notes  Assignment 1 is out (questions?)
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Scientific Computing Seminar AMLS, Spectral Schur Complements and Iterative Computation of Eigenvalues * C. BekasY. Saad Comp. Science & Engineering Dept.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ananth Grama and Ahmed Sameh Department of Computer Science Purdue University.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
6/22/2005ICS'20051 Parallel Sparse LU Factorization on Second-class Message Passing Platforms Kai Shen University of Rochester.
1 Systems of Linear Equations (Optional) Special Matrices.
CS240A: Conjugate Gradients and the Model Problem.
A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,
Antonio M. Vidal Jesús Peinado
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.
Basic Numerical methods and algorithms
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Amesos Sparse Direct Solver Package Ken Stanley, Rob Hoekstra, Marzio Sala, Tim Davis, Mike Heroux Trilinos Users Group Albuquerque 3 Nov 2004.
Lecture 8 Matrix Inverse and LU Decomposition
1 Incorporating Iterative Refinement with Sparse Cholesky April 2007 Doron Pearl.
Lecture 7 - Systems of Equations CVEN 302 June 17, 2002.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
PaStiX : how to reduce memory overhead ASTER meeting Bordeaux, Nov 12-14, 2007 PaStiX team LaBRI, UMR CNRS 5800, Université Bordeaux I Projet ScAlApplix,
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Department of Electronic Engineering, Tsinghua University Nano-scale Integrated Circuit and System Lab. Performance Analysis of Parallel Sparse LU Factorization.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
1 Chapter 7 Numerical Methods for the Solution of Systems of Equations.
Lecture 6 - Single Variable Problems & Systems of Equations CVEN 302 June 14, 2002.
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Linear Systems Dinesh A.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Symmetric-pattern multifrontal factorization T(A) G(A)
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Progress Report—11/13 宗慶. Problem Statement Find kernels of large and sparse linear systems over GF(2)
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
1 Des bulles dans le PaStiX Réunion NUMASIS Mathieu Faverge ScAlApplix project, INRIA Futurs Bordeaux 29 novembre 2006.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Middle East Technical University, Ankara, Turkey Prace.
A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue.
Parallel Direct Methods for Sparse Linear Systems
Hui Liu University of Calgary
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Xing Cai University of Oslo
Ioannis E. Venetis Department of Computer Engineering and Informatics
Model Problem: Solving Poisson’s equation for temperature
Linear Equations.
Ananth Grama and Ahmed Sameh Department of Computer Science
for more information ... Performance Tuning
Meros: Software for Block Preconditioning the Navier-Stokes Equations
A Parallel Hierarchical Solver for the Poisson Equation
Nathan Grabaskas: Batched LA and Parallel Communication Optimization
GPU Implementations for Finite Element Methods
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
Numerical Algorithms • Parallelizing matrix multiplication
Numerical Analysis Lecture10.
A computational loop k k Integration Newton Iteration
Pivoting, Perturbation Analysis, Scaling and Equilibration
Ax = b Methods for Solution of the System of Equations (ReCap):
Presentation transcript:

A computational loop k k Integration Newton Iteration Linear system solvers k k t

SPIKE: A Parallel Banded System Solver – an introduction after RCM reordering Large sparse linear systems arise often in various computational science & engineering applications. Banded, or low-rank perturbations of banded, systems (dense or sparse within the band) are sometimes obtained after reordering. SPIKE is proposed as a parallel solver for banded systems with the potential of exhibiting  multilevel parallelism

SPIKE design principles Reducing memory references and interprocessor communication at the cost of extra arithm. operations compared to LAPACK. Allowing multiple levels of parallelism. Creating a polyalgorithm – versions vary from direct to preconditioned iterative schemes.

Ax = f Next Generation Sparse Solvers: The SPIKE Algorithm A = D  S B1 C2 C3 C4 B2 B3 x1 x4 x3 x2 f1 f4 f3 f2 = Ax = f Solve Dy = f Solve Sx = y A = D  S D = diag (A1, A2, A3, A4)

The Spike Matrix “S” Reduced System . . .    =  Sx = y  := m  m  . =  Sx = y The Spike Matrix “S”  := m  m  := m  m Reduced System I   o   Order: 2m (p-1) =

SPIKE: A Polyalgorithm Different choices depending on the properties of the matrix and platform architecture (towards an adaptive library) The diagonal blocks can be solved: Directly (LU, Cholesky, or sparse counterparts) Iteratively (with a preconditioning strategy) The spikes can be computed: Explicitly (fully or partially) Approximately On the Fly The reduced system can be solved: Directly (Recursive SPIKE) Approximately (Truncated SPIKE) Iteratively (with a preconditioning scheme)

SPIKE vs ScaLapack ScaLapack SPIKE U L A1 A2 A3 A4 I V W S AX=F and A=L*U Reduced system V1 V2 V3 W2 W3 W4 AX=F and A=D*S A1 A2 A3 A4 C2 C3 C4 B1 B2 B3 Retrieve solution Spike matrix SPIKE Algorithm design: no LU factorization, no reordering, no Schur complement. New banded primitives using BLAS-3 Polyalgorithm implementation

Multilevel Parallelism: SPIKE calling MKL-Pardiso for banded systems that are sparse within the band Node 1 Node 2 Node 3 Node 4 Pardiso SPIKE SPIKE uses Pardiso on each cluster node.

SPIKE Options (dense within the band) Solving the reduced system R = recursive E = explicit F = on-the-fly T = truncated 2. Factorization (diagonal blocks) No pivoting (diagonal boosting, if necessary): L = LU U = LU & UL A = alternate LU or UL Pivoting: P = LU 3. Solution improvement: 0 direct solver only 2 iterative refinement 3 outer Bicgstab iterations

Hierarchy of Computational Modules The SPIKE algorithm Hierarchy of Computational Modules Level Description 3 SPIKE 2 Lapack Pardiso, SuperLU, MUMPS Iterative solvers 1 Primitives for banded matrices (our own): banded triangular solve banded UL BLAS3 (dense matrix-matrix primitives) Sparse BLAS

SPIKE algorithms Algorithm E Explicit R Recursive T Truncated F Factorization E Explicit R Recursive T Truncated F on the Fly P LU w/ pivoting Explicit generation of spikes- reduced system is solved iteratively with a preconditioner. EP Explicit generation of spikes- reduced system is solved directly using recursive SPIKE RP Implicit generation of reduced system which is solved on-the-fly using an iterative method. FP L LU w/o pivoting EL Explicit generation of spikes- reduce system is solved directly using recursive SPIKE RL Truncated generation of spike tips: Vb is exact, Wt is approx.- reduced system is solved directly TL FL U LU and UL w/o pivot. Truncated generation of spike tips: Vb, Wt are exact- reduced system is solved directly TU Implicit generation of reduced system which is solved on-the-fly using an iterative method with precond. FU A alternate LU / UL Explicit generation of spikes using new partitioning- reduced system is solved iteratively with a preconditioner. EA Truncated generation of spikes using new partitioning- reduced system is solved directly TA