Sparse Triangular Solve in UPC By Christian Bell and Rajesh Nishtala.

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

C. Bell, D. Bonachea, R. Nishtala, and K. Yelick, 1Berkeley UPC: Optimizing Bandwidth Limited Problems Using One-Sided Communication.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  Assignment 1 is out (questions?)
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Part 3 Chapter 10 LU Factorization PowerPoints organized by Dr. Michael R. Gustafson II, Duke University All images copyright © The McGraw-Hill Companies,
Automatic Performance Tuning of Sparse Matrix Kernels Observations and Experience Performance tuning is tedious and time- consuming work. Richard Vuduc.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome.
Witawas Srisa-an Chapter 6
1 Titanium and UPCKathy Yelick UPC Benchmarks Kathy Yelick LBNL and UC Berkeley Joint work with The Berkeley UPC Group: Christian Bell, Dan Bonachea, Wei.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
1 Concurrency: Deadlock and Starvation Chapter 6.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Data Shackling Locality enhancement of dense numerical linear algebra codes Traversals along co-ordinate axes Data-centric reference for each statement.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Unified Parallel C at LBNL/UCB FT Benchmark in UPC Christian Bell and Rajesh Nishtala.
Solving a Sudoku in Parallel
Thinking Mathematically Algebra: Graphs, Functions and Linear Systems 7.3 Systems of Linear Equations In Two Variables.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
1.1.2 INTRODUCTION TO SYSTEMS OF LINEAR EQUATIONS Chapter 1: Systems of Linear Equations and Matrices SWBAT: Redefine algebraic operations as Elementary.
High Performance Solvers for Semidefinite Programs
Parallel Sparse Matrix Algorithms for numerical computing matrix-vector multiplication.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
BG/Q vs BG/P—Applications Perspective from Early Science Program Timothy J. Williams Argonne Leadership Computing Facility 2013 MiraCon Workshop Monday.
L17: Introduction to “Irregular” Algorithms and MPI, cont. November 8, 2011.
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
Solution of Sparse Linear Systems
Lecture 4 Sparse Factorization: Data-flow Organization
Parallel Solution of the Poisson Problem Using MPI
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Linear Systems What is the Matrix?. Outline Announcements: –Homework III: due Today. by 5, by –Ideas for Friday? Linear Systems Basics Matlab and.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
1 Chapter 7 Numerical Methods for the Solution of Systems of Equations.
On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.
1.1 The row picture of a linear system with 3 variables.
Notes Over 3.1 Solving a System Graphically Graph the linear system and estimate the solution. Then check the solution algebraically.
Warm Up 116 Solve. 6n + 4 = n – 11 Determine whether each linear relationship is proportional. If so, state the constant of proportionality. Write an equation.
Algebra 1 Section 7.6 Solve systems of linear inequalities The solution to a system of linear inequalities in two variable is a set of ordered pairs making.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Objectives: Graph (and write) inequalities on a number line.
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Review Problems Matrices
A computational loop k k Integration Newton Iteration
Linear Algebra Lecture 15.
L21: Putting it together: Tree Search (Ch. 6)
Nathan Grabaskas: Batched LA and Parallel Communication Optimization
The Future of Fortran is Bright …
Warm Up Graph y = 4x and 16, $10’s and 7 $20’s.
Warm Up Graph y = 4x and 16, $10’s and 7 $20’s.
A computational loop k k Integration Newton Iteration
Presentation transcript:

Sparse Triangular Solve in UPC By Christian Bell and Rajesh Nishtala

Motivation A common but irregular mathematical operation occurring in linear algebra is the Sparse Triangular Solve (SpTS). –Solve for x in Tx = b where T is a lower triangular sparse –Used after sparse Cholesky or LU factorization to solve sparse linear systems Irregularity arises from dependence Hard to parallelize – dependence structures only known at runtime – must effectively build dependence tree in parallel

Algorithm Description To solve for x in Tx = b (T is lower triangular) for r=1:n { x(r) = b(r); for c=1:r x(r) = x(r) - T(r,c)*x(c); } Key takeaways –To solve x r you depend on all values of x before it – rows can be partially solved by knowing which values of x c are valid

Dependency Graph ●●●●● ● ● ● ● ● ● ●● ● ● ● ● MatrixDependence Graph

Data Structure Design Allow more startup time analysis of matrix so that the solve is faster Build the dependence graph in parallel Support O(1) lookup during solve time O(1) operations made easy by UPC

Solve Methodology Producer / Consumer Relationship –“consume” x vector in Tx = b to produce a new x j variable. –“production” causes generation of signal to every processor waiting on x j Difficult with two-sided model of MPI – allows you to effectively leverage one-sided communication available in UPC Avoid synchronization – by knowing a priori what part of other threads address space you can safely write into. – very difficult to get right through MPI

Performance (1)

Performance (2)

Conclusions and Future Work A new style of programming for an old problem Leverage one-sided messaging not easily available in MPI Integrate into libraries such as SuperLU