After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.
July Terry Jones, Integrated Computing & Communications Dept Fast-OS.
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Design of parallel algorithms
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
TRIVIA Click for Question What kind of matrix contains the coefficients and constants of a system of linear equations? An augmented matrix Click for:
Design of parallel algorithms Matrix operations J. Porras.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Tuning Libraries to Effectively Exploit Memory Prof. Misha Kilmer Emily Reid Stacey Ecott.
GKK 1 CASC Recommended Viewing: Since this is a pictorial presentation, I’ve gone through the effort to type up what I normally say in the “notes” section.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Parallel Sparse Matrix Algorithms for numerical computing matrix-vector multiplication.
A comparison between a direct and a multigrid sparse linear solvers for highly heterogeneous flux computations A. Beaudoin, J.-R. De Dreuzy and J. Erhel.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Introduction to Parallel Finite Element Method using GeoFEM/HPC-MW Kengo Nakajima Dept. Earth & Planetary Science The University of Tokyo VECPAR’06 Tutorial:
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
JAVA AND MATRIX COMPUTATION
Parallel Solution of the Poisson Problem Using MPI
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
8.2 Operations With Matrices
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Data Structures and Algorithms in Parallel Computing Lecture 7.
August 12, 2004 UCRL-PRES Aug Outline l Motivation l About the Applications l Statistics Gathered l Inferences l Future Work.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Notes Over 4.4 Finding the Inverse of 2 x 2 Matrix.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Parallel IO for Cluster Computing Tran, Van Hoai.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
L20: Sparse Matrix Algorithms, SIMD review November 15, 2012.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Do Now: Perform the indicated operation. 1.). Algebra II Elements 11.1: Matrix Operations HW: HW: p.590 (16-36 even, 37, 44, 46)
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Big Data is a Big Deal!.
HPML Conference, Lyon, Sept 2018
Use Inverse Matrices to Solve 2 Variable Linear Systems
Numerical Algorithms Quiz questions
3.6 Multiply Matrices.
Presentation transcript:

After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual Interfaces in hypre Allison Baker Center for Applied Scientific Computing, Lawrence Livermore National Laboratory (joint work with Rob Falgout, Jim Jones, and Ulrike Meier Yang) IJ Conceptual Interface Results OverviewOverview This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. UCRL-POST New Algorithm What is hypre? Good performance requires scalable algorithms and software Scalability: The key issue for large-scale computing Parallel computing data is in distributed form Conceptual interface gets data from application code to hypre Each processor knows only its own piece of the linear system Problem: Solvers require “nearby” data from other processors and the interfaces must determine who owns this data efficiently Goal: Scalable interfaces to solvers! LLNL’s new BlueGene/L machine > 100,000 processors! Description hypre’s traditional linear-algebraic interface Matrix and right-hand side are defined in terms of row and column indices Matrices are distributed across P processors by contiguous blocks of rows Matrix-vector multiply requires some knowledge of the global partition Old method for determining neighborhood info Each processor sends its range to all other processors All processors store the global partition and use it to determine who to receive data from (receive processors) Processors discover who to send data to (send processors) via a second communication Old method costs for P processors CommunicationO(log(P)) ComputationsO(P) StorageO(P) Goal: Generate neighborhood information in a scalable manner for large numbers of processors (P) A library of high-performance algorithms for solving large, sparse systems of linear equations on massively parallel computers For scalability, the computation, communications and storage costs should all depend on P logarithmically or better! How? Assume the global partition! The new algorithm is a kind of rendezvous algorithm that uses the concept of an assumed partition to answer queries about the global data distribution Assumed Partition algorithm: 1.Assume a global partition of data (N rows) that may be queried by any processor (with O(1) computation and storage cost) 2.Reconcile assumed rows with actual rows – contact processors regarding rows owned in another’s assumed partition 3.Use the assumed partition to determine send and receive processors What’s Next? Testing on more processors using BlueGene/L—16K processors coming soon! Adapting the assumed partition to the hypre conceptual interface for structured problems—more complicated! Comparison of the new assumed partition algorithm and the old algorithm for a 3D Laplacian operator with a 27-point stencil Each processor owns ~64,000 rows Runs on LLNL’s MCR Linux cluster New algorithm has better scaling properties—This will be important for 100,000 processors! Not good enough! As P increases, the algorithm’s | cost increases! CommunicationO(log(P) ComputationsO(log(P)) StorageO(log(P)) or O(1) New algorithm costs for P processors This assumed partition concept is applicable to all of hypre’s conceptual interfaces and a variety of situations in parallel codes A1A1 A2A2 APAP A = Rows owned by 1, In 2’s assumed partition Actual partition Assumed partition For a balanced partitioning, one could assume: proc = row P N