Lecture 6: Design of Parallel Programs Part I Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Slides:



Advertisements
Similar presentations
Introduction to Parallel Computing
Advertisements

Memory Management Chapter 7.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
Good modeling practices AGEC 641 Lab, Fall 2011 Mario Andres Fernandez Based on material written by Gillig and McCarl. Improved upon by many previous lab.
Reference: Message Passing Fundamentals.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
1 Software Testing and Quality Assurance Lecture 40 – Software Quality Assurance.
Designing Parallel Programs David Rodriguez-Velazquez CS-6260 Spring-2009 Dr. Elise de Doncker.
Lecture 4: Timing & Programming Models Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
School of Computer Science & Information Technology G6DICP - Lecture 9 Software Development Techniques.
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 5: Software Design & Testing; Revision Session.
Lecture 7: Design of Parallel Programs Part II Lecturer: Simon Winberg.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Processor Architecture
Software Engineering Requirements + Specifications.
Lecture 4: Timing & Programming Models Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
Parallel Computing Presented by Justin Reschke
Concurrency and Performance Based on slides by Henri Casanova.
Lecture 4: Lecturer: Simon Winberg Temporal and Spatial Computing, Shared Memory, Granularity Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Lecture 5: Lecturer: Simon Winberg Review of paper: Temporal Partitioning Algorithm for a Coarse-grained Reconfigurable Computing Architecture by Chongyong.
Lecture 3: Harvard Architecture, Shared Memory, Architecture Case Studies Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
These slides are based on the book:
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Memory Management.
Design of Parallel Programs / Heterogeneous Computing Solutions
C++ coding standard suggestion… Separate reasoning from action, in every block. Hi, this talk is to suggest a rule (or guideline) to simplify C++ code.
Parallel Programming By J. H. Wang May 2, 2017.
EEE4084F Digital Systems NOT IN 2017 EXAM Lecture 25
Parallel Algorithm Design
Introduction to Parallelism.
Cache Memory Presentation I
Lecture 5: GPU Compute Architecture
Parallel Programming in C with MPI and OpenMP
Evaluation of Relational Operations: Other Operations
CSCI1600: Embedded and Real Time Software
EEE4084F Digital Systems NOT IN 2018 EXAM Lecture 24X
Lecture 5: GPU Compute Architecture for the last time
Lecture 18 X: HDL & VHDL Quick Recap
Designing Parallel Programs
Background and Motivation
Multithreaded Programming
Parallel Algorithm Models
Management From the memory view, we can list four important tasks that the OS is responsible for ; To know the used and unused memory partitions To allocate.
Evaluation of Relational Operations: Other Techniques
COMP60611 Fundamentals of Parallel and Distributed Systems
CSCI1600: Embedded and Real Time Software
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Lecture 6: Design of Parallel Programs Part I Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Syllabus: Seminar 3 (i.e., today’s seminar reading) Lectures 5,6,7 Might ask something about pthreads or OpenGL to check that you know what you did in the prac. …

 Seminar this afternoon 3pm  Recap of memory architectures  Steps in designing parallel programs  Step 1: understanding the problem  Step 2: partitioning  Step 3: decomposition & granularity

EEE4084F The more detailed slides on memory architectures in the previous lectures was assigned as a reading task last week.

 Shared memory architecture  Uniform access memory (UAM) vs. Non-uniform access memory (NAM)  Distributed memory  Similar to shared memory, but needs comms network to share memory (NB not always same as MPI)  Distributed shared memory (hybrid)  Best & worst of both ‘worlds’

Steps in Designing Parallel Programs …

Starting the somewhat long and arduous trek on the topic of designing parallel programs Hardcore competent parallel programmers (leading the way to greater feats) Sequential programmers in their comfort zone. EEE4084F

… The hardware may be done first… or later. The main steps: 1.Understand the problem 2.Partitioning (separation into main tasks) 3.Decomposition & Granularity 4.Communications 5.Identify data dependencies 6.Synchronization 7.Load balancing 8.Performance analysis and tuning

 Make sure that you understand the problem, that is the right problem, before you attempt to formulate a solution  Some problems aren’t suited to parallelization – it just might not be parallelizable Example of non-parallelizable problem: Calculating the set of Fibonacci series e.g., Fib(10 20 ) as quickly as possible. Fib(n) = n } if n < 2 = Fib(n – 1) + Fib(n – 2) } otherwise (i.e. ‘problem validation’)

 Many problems may have obvious parallel solutions.  Others may need some concerted effort to discover the parallel nature. Easily parallel example: Given two tables, client (clientname,clientnum) and orders (clientnum,ordernum,date) find the name of clients who ordered something on 1 March Obvious solution: divide orders into smaller parts, each task checks the dates in its partition, it it matches the clientnum is added to a temporary list, at the end all the temporary lists are combined and clientnum replaced by clientname.

 Identify: Critical parts or ‘hotspots’  Determine where most of the work needs to be done. Most scientific and technical programs accomplish the most substantial portion of the work in only a few small places.  Focus on parallelizing hotspots. Ignore parts of the program that don’t need much CPU use.  Consider which profiling techniques and Performance Analysis Tools (‘PATs’) to use

 Identify: bottlenecks  Consider communication, I/O, memory and processing bottlenecks  Determine areas of the code that execute notably slower than others.  Add buffers / lists to avoid waiting  Attempt to avoid blocking calls (e.g., only block if the output buffer is full)  Try to rearrange code to make loops faster

 General method:  identify hotspots, avoid unnecessary complication, identify potential inhibitors to the parallel design (e.g., data dependencies).  Consider other algorithms…  This is an most important aspect of designing parallel applications.  Sometimes the obvious method can be greatly improved upon though some lateral thought, and testing on paper.

General Systems Thinking (GST) Critical Analysis (& Critical Thinking) How does the problem of focus relate to more general problems? What other systems are being depended on or are affected? Has a systems approach been considered in addition to the usual divide and simplify approach? Let’s consider what happens, in a logical scientific / deductive manner, if we discard or replace some of the traditions or ‘conditioned choices’. Reflecting on choices and descriptions Considering ‘what if…’ scenarios (e.g. what if a chair had only one leg instead of four) Applying rational and logical thinking while deconstructing texts or subject matter studied.* * Adapted from: Browne, M & Keeley, S 2001, Asking the right questions: a guide to critical thinking, 6th edn, Prentice-Hall, Upper Saddle River, N.J. See last page for suggested sites to learn more about critical analysis

YouTube clip: Systems thinking an introduction.mp4

 This step involves breaking the problem into separate chunks of work, which can then be implemented as multiple distributed tasks.  Two typical methods to partition computation among parallel tasks: 1. Functional decomposition or 2. Domain decomposition

 Functional decomposition focuses on the computation to be done, rather than the way data is separated and manipulated by tasks.  The problem is decomposed into tasks according to the work that must be done.  Each task performs a portion of the overall work.  Tends to be closer to message passing and MIMD systems

The Problem Task #1: Task #2: Task #3: Task #4: Functional decomposition is suited to problems that can be split into different tasks

 Example applications  Environment modelling  Simulating reality  Signal processing, e.g.:  Pipelined filters: in  P1  P2  P3  out  Here P1 is filled first, its result is sent to P2 while simultaneously P1 starts working on a block of new input, and so on.  Climate modelling (e.g., simultaneously running simulations for atmosphere, land, and sea)

The Data Involves separating the data, or taking different ‘views’ of the same data. E.g. View 1 : looking at frequencies (e.g. FFTs) View 2 : looking at amplitudes Each parallel task works on its own portion of the data, or does something different with the same data Result The Data View 1View 2 Task1Task2Task3 Result Task1Task2

 Good to use for problems where:  Data is static (e.g., factoring; matrix calculations)  Dynamic data structures linked to a single entity (where entity can be made into subsets) (e.g., multi-body problems)  Domain is fixed, but computation within certain regions of the domain is dynamic (e.g., fluid vortices models)

Many possible ways to divide things up. If you want to look at it visually… continuousBlocked (same- size partitions) Blocked Interlaced Interleaved or cyclic Other methods can be done

 Practice example  Continuation of the steps:  Step 3 Granularity  Class activity  Step 4 Communications … Some resources related to critical analysis and critical thinking: An easy introduction to critical analysis: An online quiz and learning tool for understanding critical thinking: Some resources related to systems thinking:

Image sources: Book clipart drawing - Wikipedia (open commons) Beach chairs, Himalayan Climbers - (public domain) Picture of running going up stairs – Wikimedia open commons Sierpinski pyramid (slide 13) – Wikimedia open commons 3-legged chair (slide 13) – Adapted (modification permitted) from Wikimedia Disclaimers and copyright/licensing details I have tried to follow the correct practices concerning copyright and licensing of material, particularly image sources that have been used in this presentation. I have put much effort into trying to make this material open access so that it can be of benefit to others in their teaching and learning practice. Any mistakes or omissions with regards to these issues I will correct when notified. To the best of my understanding the material in these slides can be shared according to the Creative Commons “Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)” license, and that is why I selected that license to apply to this presentation (it’s not because I particulate want my slides referenced but more to acknowledge the sources and generosity of others who have provided free material such as the images I have used).