Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

Using the Iteration Space Visualizer in Loop Parallelization Yijun YU
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
A Backtracking Correction Heuristic for Graph Coloring Algorithms Sanjukta Bhowmick and Paul Hovland Argonne National Laboratory Funded by DOE.
Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Alex Pothen Purdue University CSCAPES Institute Assefaw Gebremedhin, Mahantesh Halappanavar (PNNL), John Feo (PNNL), Umit.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Embedded Parallel Systems Based on Dynamic Look-Ahead Reconfiguration in Redundant Systems Stephen Holmes.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Combinatorial Scientific Computing is concerned with the development, analysis and utilization of discrete algorithms in scientific and engineering applications.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Michelle Mills Strout OpenAnalysis: Representation- Independent Program Analysis CCA Meeting January 17, 2008.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.
Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu
Parallel Computing Sciences Department MOV’01 Multilevel Combinatorial Methods in Scientific Computing Bruce Hendrickson Sandia National Laboratories Parallel.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
1 1  Capabilities: Dynamic load balancing and static data partitioning -Geometric, graph-based, hypergraph-based -Interfaces to ParMETIS, PT-Scotch, PaToH.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
CS 420 Design of Algorithms Parallel Algorithm Design.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN CCA Status, Code Walkthroughs, and Demonstrations.
Dynamic Load Balancing in Scientific Simulation
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Auburn University
Parallel Hypergraph Partitioning for Scientific Computing
Hybrid BDD and All-SAT Method for Model Checking
Introduction to the Finite Element Method
A Backtracking Correction Heuristic
Challenges in Electromagnetic Modeling Scalable Solvers
Scientific Computing Lab
Scientific Computing Lab
Resource Elasticity for Large-Scale Machine Learning
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Scientific Computing Lab
Modeling and Analysis Tutorial
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin Dobrian, and Assefaw Gebremedhin (ODU) Erik Boman, Karen Devine, and Bruce Hendrickson (SNL) Paul Hovland, Boyana Norris, and Jean Utke (ANL) Umit Catalyurek (OSU) Michelle Strout (CSU)

CSCAPES Mission Research and development –Load balancing and parallelization toolkits for petascale computing –Automatic differentiation capabilities –Advanced sparse matrix methods Major software outlets (open source): –Zoltan and OpenAD/ADIC Training and outreach –Train researchers in CSC skills at pre-doctoral and post-doctoral levels –Organize workshops, tutorials and short courses in CSC –Collaborate with SciDAC SAPs and CETs, academia, and industry Accelerating the development and deployment of fundamental enabling technologies in high performance computing CSCAPES

Partitioning and Load Balancing Goal: assign data to processors to –minimize application runtime –maximize utilization of computing resources Metrics: –minimize processor idle time (balance work loads) –keep inter-processor communication costs low Impacts the performance of a wide range of simulations CSCAPES plans to extend Zoltan for petascale applications Adaptive Mesh Refinement Contact detection Particle Simulations x b A = Linear solvers & preconditioners CSCAPES

Combinatorics in Automatic Differentiation Automatic Differentiation computes analytic derivatives of functions specified by programs Derivative accumulation is posed as a graph problem –Represent a function using a directed acyclic graph –Vertices are intermediate variables, edge weights are partial derivatives –Compute sum of weights over all paths from independent to dependent variable(s) weight of a path P is product of weights of edges along P CSCAPES plans to –Develop algorithms to reduce flops by graph elimination –Find equivalent DAGs with fewest edges –Detect sparsity of derivative matrices, then use coloring to reduce cost –Differentiate parallel reduction operations by enumerating subsets from a distributed collection in parallel y x f a c t0 y d0 b a a expsin * * * CSCAPES

Graph Coloring for Computing Derivatives Sparsity exploitation leads to a variety of graph coloring problems Coloring also discovers concurrency in parallel computing Developed novel algorithms for several coloring problems Preliminary parallel versions developed for two coloring problems CSCAPES plans to –Extend coloring software and integrate with AD tools –Design petascale parallel coloring algorithms CSCAPES

Matching for Sparse Matrix Computations Graph matching has many applications in sparse matrix computations and graph partitioning Traditional matching algorithms compute optimal solutions in super- linear time and are difficult to parallelize Current research trends are toward linear time approximation algorithms and parallelization CSCAPES plans to develop petascale parallel matching algorithms based on approximation techniques CSCAPES

Performance Improvement via Data Reordering Irregular memory access patterns make performance sensitive to data and iteration orders Run-time reordering transformations schedule data accesses and iterations to maximize performance Preliminary work on reordering heuristics [Strout & Hovland, 2004] shows that hypergraph models outperform graph models CSCAPES plans to develop hypergraph-based runtime reordering transformations CSCAPES