CS 584.

Slides:



Advertisements
Similar presentations
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Advertisements

Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Strategies for Implementing Dynamic Load Sharing.
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
CS213 Parallel Processing Architecture Lecture 5: MIMD Program Design
Virtues of Good (Parallel) Software
1 Parallel Computing 5 Parallel Application Design Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Parallel Simulation of Continuous Systems: A Brief Introduction
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.
Concurrency and Performance Based on slides by Henri Casanova.
Dynamic Load Balancing Tree and Structured Computations.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
Auburn University
COMPUTATIONAL MODELS.
Conception of parallel algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Pattern Parallel Programming
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
EE 193: Parallel Computing
CPSC 531: System Modeling and Simulation
Chapter 17: Database System Architectures
شیوه های موازی سازی parallelization methods
COMP60621 Designing for Parallelism
Professor Ioana Banicescu CSE 8843
CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.
Parallel Algorithm Models
Load Balancing Definition: A load is balanced if no processes are idle
Presented By: Darlene Banta
Mattan Erez The University of Texas at Austin
Adaptivity and Dynamic Load Balancing
Database System Architectures
Introduction to High Performance Computing Lecture 17
Parallel Programming in C with MPI and OpenMP
CS 584 Lecture 5 Assignment. Due NOW!!.
Mattan Erez The University of Texas at Austin
Presentation transcript:

CS 584

Review Partition Communication Domain Decomposition Functional Decomposition Communication Avoid centralized patterns Avoid sequential patterns Divide and Conquer

Agglomeration Partition and Communication steps were abstract Agglomeration moves to concrete. Combine tasks to execute efficiently on some parallel computer. Consider replication.

Agglomeration Goals Reduce communication costs by increasing computation decreasing/increasing granularity Retain flexibility for mapping and scaling. Reduce software engineering costs.

Changing Granularity A large number of tasks does not necessarily produce an efficient algorithm. We must consider the communication costs. Reduce communication by having fewer tasks sending less messages (batching)

Surface to Volume Effects Communication is proportional to the surface of the subdomain. Computation is proportional to the volume of the subdomain. Increasing computation will often decrease communication.

How many messages total? How much data is sent?

How many messages total? How much data is sent?

Replicating Computation Trade-off replicated computation for reduced communication. Replication will often reduce execution time as well.

Summation of N Integers s = sum b = broadcast How many steps?

Using Replication (Butterfly)

Using Replication Butterfly to Hypercube

Avoid Communication Look for tasks that cannot execute concurrently because of communication requirements. Replication can help accomplish two tasks at the same time, like: Summation Broadcast

Preserve Flexibility Create more tasks than processors. Overlap communication and computation. Don't incorporate unnecessary limits on the number of tasks.

Agglomeration Checklist Reduce communication costs by increasing locality. Do benefits of replication outweigh costs? Does replication compromise scalability? Does the number of tasks still scale with problem size? Is there still sufficient concurrency?

Mapping Specify where each task is to operate. Mapping may need to change depending on the target architecture. Mapping is NP-complete.

Mapping Goal: Reduce Execution Time Mapping is a game of trade-offs. Concurrent tasks ---> Different processors High communication ---> Same processor Mapping is a game of trade-offs.

Mapping Many domain-decomposition problems make mapping easy. Grids Arrays etc.

Mapping Unstructured or complex domain decomposition based algorithms are difficult to map.

Other Mapping Problems Variable amounts of work per task Unstructured communication Heterogeneous processors different speeds different architectures Solution: LOAD BALANCING

Load Balancing Static Probabilistic Dynamic Determined a priori Based on work, processor speed, etc. Probabilistic Random Dynamic Restructure load during execution Task Scheduling (functional decomp.)

Static Load Balancing Based on a priori knowledge. Goal: Equal WORK on all processors Algorithms: Basic Recursive Bisection

å Basic æ ö ç ÷ p r = R ç ÷ p ç ÷ è ø Divide up the work based on Work required Processor speed æ ö ç ÷ p r = R ç i ÷ å i p ç ÷ è ø i

Recursive Bisection Divide work in half recursively. Based on physical coordinates.

Dynamic Algorithms Adjust load when an imbalance is detected. Local or Global

Task Scheduling Many tasks with weak locality requirements. Manager-Worker model.

Task Scheduling Manager-Worker Hierarchical Manager-Worker Uses submanagers Decentralized No central manager Task pool on each processor Less bottleneck

Mapping Checklist Is the load balanced? Are there communication bottlenecks? Is it necessary to adjust the load dynamically? Can you adjust the load if necessary? Have you evaluated the costs?

PCAM Algorithm Design Partition Communication Agglomeration Mapping Domain or Functional Decomposition Communication Link producers and consumers Agglomeration Combine tasks for efficiency Mapping Divide up the tasks for balanced execution

Example: Atmosphere Model Simulate atmospheric processes Wind Clouds, etc. Solves a set of partial differential equations describing the fluid behavior

Representation of Atmosphere

Data Dependencies

Partition & Communication

Agglomeration

Mapping

Mapping