Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Multiple Processor Systems
Threads, SMP, and Microkernels
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Parallel Programming Models and Paradigms Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia.
Multiple Processor Systems
Introduction to MIMD architectures
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Reference: Message Passing Fundamentals.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Parallel Programming Models and Paradigms
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Mapping Techniques for Load Balancing
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Parallel Architectures
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Multi-core architectures. Single-core computer Single-core CPU chip.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Parallel Computing Presented by Justin Reschke
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Lecture 1 – Parallel Programming Primer
CMSC 611: Advanced Computer Architecture
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel Programming By J. H. Wang May 2, 2017.
What is Parallel and Distributed computing?
Multiple Processor Systems
Chapter 1 Introduction.
Parallel Algorithm Models
Presentation transcript:

Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

Outline Introduction to Parallel Computing Parallel Architectures Parallel Algorithms Common Parallel Programming Models  Message-Passing  Shared Address Space

Introduction to Parallel Computing Moore’s Law  The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months.  Self-fulfilling prophecy Computer architect goal Software developer assumption

Introduction to Parallel Computing Impediments to Moore’s Law  Theoretical Limit  What to do with all that die space?  Design complexity  How do you meet the expected performance increase?

Introduction to Parallel Computing Parallelism  Continue to increase performance via parallelism.

Introduction to Parallel Computing From a software point-of-view, need to solve demanding problems  Engineering Simulations  Scientific Applications  Commercial Applications Need the performance, resource gains afforded by parallelism

Introduction to Parallel Computing Engineering Simulations  Aerodynamics  Engine efficiency

Introduction to Parallel Computing Scientific Applications  Bioinformatics  Thermonuclear processes  Weather modeling

Introduction to Parallel Computing Commercial Applications  Financial transaction processing  Data mining  Web Indexing

Introduction to Parallel Computing Unfortunately, greatly increases coding complexity  Coordinating concurrent tasks  Parallelizing algorithms  Lack of standard environments and support

Introduction to Parallel Computing The challenge  Provide the abstractions, programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems.

Parallel Architectures Standard sequential architecture CPU RAM BUS Bottlenecks

Parallel Architectures Use multiple  Datapaths  Memory units  Processing units

Parallel Architectures SIMD  Single instruction stream, multiple data stream Processing Unit Control Unit Interconnect Processing Unit Processing Unit Processing Unit Processing Unit

Parallel Architectures SIMD  Advantages Performs vector/matrix operations well  EX: Intel’s MMX chip  Disadvantages Too dependent on type of computation  EX: Graphics Performance/resource utilization suffers if computations aren’t “embarrasingly parallel”.

Parallel Architectures MIMD  Multiple instruction stream, multiple data stream Processing/Control Unit Processing/Control Unit Processing/Control Unit Processing/Control Unit Interconnect

Parallel Architectures MIMD  Advantages Can be built with off-the-shelf components Better suited to irregular data access patterns  Disadvantages Requires more hardware (!sharing control unit) Store program/OS at each processor Ex: Typical commodity SMP machines we see today.

Parallel Architectures Task Communication  Shared address space Use common memory to exchange data Communication and replication are implicit  Message passing Use send()/receive() primitives to exchange data Communication and replication are explicit

Parallel Architectures Shared address space  Uniform memory access (UMA) Access to a memory location is independent of which processing unit makes the request.  Non-uniform memory access (NUMA) Access to a memory location depends on the location of the processing unit relative to the memory accessed.

Parallel Architectures Message passing  Each processing unit has its own private memory  Exchange of messages used to pass data  APIs Message Passing Interface (MPI) Parallel Virtual Machine (PVM)

Parallel Algorithms Algorithm  a sequence of finite instructions, often used for calculation and data processing. Parallel Algorithm  An algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result

Parallel Algorithms Challenges  Identifying work that can be done concurrently.  Mapping work to processing units.  Distributing the work  Managing access to shared data  Synchronizing various stages of execution.

Parallel Algorithms Models  A way to structure a parallel algorithm by selecting decomposition and mapping techniques in a manner to minimize interactions.

Parallel Algorithms Models  Data-parallel  Task graph  Work pool  Master-slave  Pipeline  Hybrid

Parallel Algorithms Data-parallel  Mapping of Work Static Tasks -> Processes  Mapping of Data Independent data items assigned to processes (Data Parallelism)

Parallel Algorithms Data-parallel  Computation Tasks process data, synchronize to get new data or exchange results, continue until all data processed  Load Balancing Uniform partitioning of data  Synchronization Minimal or barrier needed at end of a phase  Ex: Ray Tracing

Parallel Algorithms Data-parallel P P P P P DD DD D O O O O O

Parallel Algorithms Task graph  Mapping of Work Static Tasks are mapped to nodes in a data dependency task dependency graph (Task parallelism)  Mapping of Data Data moves through graph (Source to Sink)

Parallel Algorithms Task graph  Computation Each node processes input from previous node(s) and send output to next node(s) in the graph  Load Balancing Assign more processes to a given task Eliminate graph bottlenecks  Synchronization Node data exchange  Ex: Parallel Quicksort, Divide and Conquer approaches

Parallel Algorithms Task graph P P P P P P D D D O O

Parallel Algorithms Work pool  Mapping of Work/Data No desired pre-mapping Any task performed by any process  Computation Processes work as data becomes available (or requests arrive)

Parallel Algorithms Work pool  Load Balancing Dynamic mapping of tasks to processes  Synchronization Adding/removing work from input queue  Ex: Web Server

Parallel Algorithms Work pool P Work Pool P P P P Input queue Output queue

Parallel Algorithms Master-slave  Modification to Worker Pool Model One or more Master processes generate and assign work to worker processes  Load Balancing A Master process can better distribute load to worker processes

Parallel Algorithms Pipeline  Mapping of work Processes are assigned tasks that correspond to stages in the pipeline Static  Mapping of Data Data processed in FIFO order  Stream parallelism

Parallel Algorithms Pipeline  Computation Data is passed through a succession of processes, each of which will perform some task on it  Load Balancing Insure all stages of the pipeline are balanced (contain the same amount of work)  Synchronization Producer/Consumer buffers between stages  Ex: Processor pipeline, graphics pipeline

Parallel Algorithms Pipeline P Input queue Output queue PP buffer

Common Parallel Programming Models Message-Passing Shared Address Space

Common Parallel Programming Models Message-Passing  Most widely used for programming parallel computers (clusters of workstations)  Key attributes: Partitioned address space Explicit parallelization  Process interactions Send and receive data

Common Parallel Programming Models Message-Passing  Communications Sending and receiving messages Primitives  send(buff, size, destination)  receive(buff, size, source)  Blocking vs non-blocking  Buffered vs non-buffered Message Passing Interface (MPI)  Popular message passing library  ~125 functions

Common Parallel Programming Models Message-Passing Workstation Data P4P3P2P1 send(buff1, 1024, p3)receive(buff3, 1024, p1)

Common Parallel Programming Models Shared Address Space  Mostly used for programming SMP machines (multicore chips)  Key attributes Shared address space  Threads  Shmget/shmat UNIX operations Implicit parallelization  Process/Thread communication Memory reads/stores

Common Parallel Programming Models Shared Address Space  Communication Read/write memory  EX: x++;  Posix Thread API Popular thread API Operations  Creation/deletion of threads  Synchronization (mutexes, semaphores)  Thread management

Common Parallel Programming Models Shared Address Space Workstation T3T4T2T1 Data SMP RAM

Parallel Programming Pitfalls Synchronization  Deadlock  Livelock  Fairness Efficiency  Maximize parallelism Reliability  Correctness  Debugging

Prelude to MapReduce MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code Automates data distribution & result aggregation Restricts the ways data can interact to eliminate locks (no shared state = no locks!)

Prelude to MapReduce Next time…  MapReduce parallel programming paradigm

References Introduction to Parallel Computing, Grama et al., Pearson Education, 2003 Distributed Systems  Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004