1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication.

Slides:



Advertisements
Similar presentations
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Advertisements

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Reference: Message Passing Fundamentals.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
Software Version Control SubVersion software version control system WebSVN graphical interface o View version history logs o Browse directory structure.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
A Load Balancing Framework for Adaptive and Asynchronous Applications Kevin Barker, Andrey Chernikov, Nikos Chrisochoides,Keshav Pingali ; IEEE TRANSACTIONS.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
Mapping Techniques for Load Balancing
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Combinatorial Scientific Computing is concerned with the development, analysis and utilization of discrete algorithms in scientific and engineering applications.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
1 Presenters: Cameron W. Smith and Glen Hansen Workflow demonstration using Simmetrix/PUMI/PAALS for parallel adaptive simulations FASTMath SciDAC Institute.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
1 1  Capabilities: Dynamic load balancing and static data partitioning -Geometric, graph-based, hypergraph-based -Interfaces to ParMETIS, PT-Scotch, PaToH.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
New Features in ML 2004 Trilinos Users Group Meeting November 2-4, 2004 Jonathan Hu, Ray Tuminaro, Marzio Sala, Michael Gee, Haim Waisman Sandia is a multiprogram.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
ICS362 – Distributed Systems Dr. Ken Cosh Week 2.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
1 1  Capabilities: PCU: Communication, threading, and File IO built on MPI APF: Abstract definition of meshes, fields, and their algorithms GMI: Interface.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
Dynamic Load Balancing Tree and Structured Computations.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Parallel Hypergraph Partitioning for Scientific Computing
Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun
Parallel Algorithm Design
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Dynamic Load Balancing of Unstructured Meshes
Presentation transcript:

1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication “plans” describe complex interprocessor patterns -Data migration -Mapping between different partitions Dynamic Load Balancing: -Geometric (coordinate-based): fast, maintains geometric locality -Topological (graph/hypergraph-based): explicitly model application communication costs -Interfaces to ParMETIS, Scotch, PaToH Distributed Data Directories -Parallel look-ups of off-processor data -Scalable (O(N)) total memory usage ABC 010 DEF 210 GHI 121 Parallel Graph Coloring -Finds disjoint sets of vertices, identifying independence -Distance 1 -Distance 2

2 2 Zoltan2: Next generation toolkit targeting needs of applications on emerging architectures Multijagged (MJ) Geometric (Coordinate) Partitioning  MPI+OpenMP implementation  Multisection results in less data movement, greater scalability during partitioning than RCB  Ex: Used in Trilinos’ MueLu multigrid solver on 524K cores On-node Balanced Graph Coloring  Finds disjoint sets of vertices for parallelism in multicore execution  Each label has roughly equal number of vertices  Balanced coloring reduces idle cores in GPUs 16-part 4x4 MJ partition Tasks Allocated Nodes in Torus Network Architecture-aware Geometric Task Mapping  Map MPI tasks to cores to keep congestion and communication costs low  Uses MJ to assign interdependent tasks to “nearby” cores  Reduced MiniGhost execution time on 64K cores of Cielo by 34% relative to default; by 24% relative to custom 2x2x4 grouping XX 2XXXXXX 3XXX 4XXXX 5XX 6XXX

3 3 Further information: Download via the Trilinos toolkit: ZoltanZoltan2 Parallelism:MPI-onlyMPI+X API:Application builds model (e.g., graph, hypergraph) for algorithm Application describes its data (matrix, mesh, coordinates); algorithm builds model Capabilities:Parallel partitioning Parallel coloring Global and local ordering Parallel partitioning Architecture-aware task placement On-node balanced coloring On-node ordering Optional TPLs:Scotch (INRIA/Bordeaux) ParMETIS (U. Minnesota) PaToH (Ohio St. U.) Scotch (INRIA/U.Bordeaux) ParMETIS (U.Minnesota) ParMA Partition Improvement (RPI) AMD (U.Florida) LDMS: Lightweight Distributed Metric Service (Sandia) Maturity:Highly mature; maintenance onlyResearch platform for emerging archs Integration:No dependence on TrilinosIntegrated with Trilinos next-generation software stack Language:C (with F90 and C++ APIs)Templated C++11 Distribution:Stand-alone or in TrilinosIn Trilinos