Distributed Systems CS

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
SE-292 High Performance Computing
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Background Computer System Architectures Computer System Software.
Parallel System Performance CS 524 – High-Performance Computing.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Parallel System Performance CS 524 – High-Performance Computing.
Distributed Systems CS Programming Models- Part I Lecture 16, Oct 31, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Distributed Systems CS /640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andVinay.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Distributed Systems CS Programming Models Gregory Kesden Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Chapter 2: Parallel Programming Models Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Overview Parallel Processing Pipelining
18-447: Computer Architecture Lecture 30B: Multiprocessors
CS5102 High Performance Computer Systems Thread-Level Parallelism
Distributed Processors
What Exactly is Parallel Processing?
CS 147 – Parallel Processing
Distributed Systems CS
CMSC 611: Advanced Computer Architecture
Distributed Systems CS
Guoliang Chen Parallel Computing Guoliang Chen
Lecture 1: Parallel Architecture Intro
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Chapter 17 Parallel Processing
Multiprocessors - Flynn’s taxonomy (1966)
Multiple Processor Systems
Introduction to Multiprocessors
Lecture 24: Memory, VM, Multiproc
Distributed Systems CS
AN INTRODUCTION ON PARALLEL PROCESSING
Distributed Systems CS
Distributed Systems CS
Chapter 4 Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Database System Architectures
Chapter 01: Introduction
CSL718 : Multiprocessors 13th April, 2006 Introduction
Operating System Overview
Programming Parallel Computers
Presentation transcript:

Distributed Systems CS 15-440 Programming Models- Part I Lecture 15, Nov 5, 2012 Majd F. Sakr and Mohammad Hammoud

Today… Last session Fault Tolerance - Part I Today’s session Programming Models – Part I Announcements: Project 2 is due tonight Project 3 will be out by tomorrow Midterm grades will be out by Wednesday

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce Message Passing Interface (MPI) Examples of parallel processing Traditional models of parallel programming Parallel computer architectures Why parallelism?

Intended Learning Outcomes: Programming Models Considered: a reasonably critical and comprehensive perspective. Thoughtful: Fluent, flexible and efficient perspective. Masterful: a powerful and illuminating ILO6 Explain and apply the shared memory, the message passing, and the MapReduce programming models and describe the important differences between them ILO6.1 Describe the differences between the 3 models in terms of communication, synchronization, hardware support, development effort, and tuning effort ILO6.2 Explain the MapReduce program flow and how it communicates with the Hadoop distributed file system (HDFS) ILO6.3 Apply MapReduce and MPI to real problems ILO6.4 Describe the relationship between the programming models and the architecture of the system ILO6 ILO 6.1 ILO 6.2 ILO 6.3 ILO 6.4

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce Message Passing Interface (MPI) Examples of parallel processing Traditional models of parallel programming Parallel computer architectures Why parallelism? Why parallelism?

Amdahl’s Law We parallelize our programs in order to run them faster How much faster will a parallel program run? Suppose that the sequential execution of a program takes T1 time units and the parallel execution on p processors takes Tp time units Suppose that out of the entire execution of the program, s fraction of it is not parallelizable while 1-s fraction is parallelizable Then the speedup (Amdahl’s formula):  

Amdahl’s Law: An Example Suppose that 80% of you program can be parallelized and that you use 4 processors to run your parallel version of the program The speedup you can get according to Amdahl is: Although you use 4 processors you cannot get a speedup more than 2.5 times (or 40% of the serial running time)  

Real Vs. Actual Cases Amdahl’s argument is too simplified to be applied to real cases When we run a parallel program, there are a communication overhead and a workload imbalance among processes in general 20 80 20 80 Serial Serial Parallel 20 20 Parallel 20 20 Process 1 Process 1 Process 2 Process 2 Cannot be parallelized Process 3 Process 3 Cannot be parallelized Can be parallelized Communication overhead Process 4 Can be parallelized Process 4 Load Unbalance 1. Parallel Speed-up: An Ideal Case 2. Parallel Speed-up: An Actual Case

Guidelines In order to efficiently benefit from parallelization, we ought to follow these guidelines: Maximize the fraction of our program that can be parallelized Balance the workload of parallel processes Minimize the time spent for communication

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce Message Passing Interface (MPI) Examples of parallel processing Traditional models of parallel programming Parallel computer architectures Parallel computer architectures Why parallelism?

Parallel Computer Architectures We can categorize the architecture of parallel computers in terms of two aspects: Whether the memory is physically centralized or distributed Whether or not the address space is shared Memory Address Space Shared Individual Centralized SMP (Symmetric Multiprocessor) N/A Distributed NUMA (Non-Uniform Memory Access) MPP (Massively Parallel Processors) Memory Address Space Shared Individual Centralized UMA – SMP (Symmetric Multiprocessor) N/A Distributed NUMA (Non-Uniform Memory Access) MPP (Massively Parallel Processors) Memory Address Space Shared Individual Centralized SMP (Symmetric Multiprocessor) N/A Distributed NUMA (Non-Uniform Memory Access) MPP (Massively Parallel Processors) Memory Address Space Shared Individual Centralized SMP (Symmetric Multiprocessor) N/A Distributed NUMA (Non-Uniform Memory Access) MPP (Massively Parallel Processors) Memory Address Space Shared Individual Centralized SMP (Symmetric Multiprocessor) N/A Distributed NUMA (Non-Uniform Memory Access) MPP (Massively Parallel Processors)

Symmetric Multiprocessor Symmetric Multiprocessor (SMP) architecture uses shared system resources that can be accessed equally from all processors Classically, a single OS controls the SMP machine and it schedules processes and threads on processors so that the load is balanced Processor Processor Processor Processor Cache Cache Cache Cache Bus or Crossbar Switch Memory I/O

Massively Parallel Processors Massively Parallel Processors (MPP) architecture consists of nodes with each having its own processor, memory and I/O subsystem An independent OS runs at each node Interconnection Network Processor Processor Processor Processor Cache Cache Cache Cache Bus Bus Bus Bus Memory I/O Memory I/O Memory I/O Memory I/O

Non-Uniform Memory Access Non-Uniform Memory Access (NUMA) architecture machines are built on a similar hardware model as MPP NUMA typically provides a shared address space to applications using a hardware/software directory-based coherence protocol The memory latency varies according to whether you access memory directly (local) or through the interconnect (remote). Thus the name non-uniform memory access As in an SMP machine, usually a single OS controls the whole system

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce Message Passing Interface (MPI) Examples of parallel processing Traditional Models of parallel programming Traditional Models of parallel programming Parallel computer architectures Why parallelizing our programs?

Models of Parallel Programming What is a parallel programming model? A programming model is an abstraction provided by the hardware to programmers It determines how easily programmers can specify their algorithms into parallel unit of computations (i.e., tasks) that the hardware understands It determines how efficiently parallel tasks can be executed on the hardware Main Goal: utilize all the processors of the underlying architecture (e.g., SMP, MPP, NUMA) and minimize the elapsed time of your program

Traditional Parallel Programming Models Shared Memory Message Passing Message Passing

Shared Memory Model In the shared memory programming model, the abstraction is that parallel tasks can access any location of the memory Parallel tasks can communicate through reading and writing common memory locations This is similar to threads from a single process which share a single address space Multi-threaded programs (e.g., OpenMP programs) are the best fit with shared memory programming model

Shared Memory Model Si = Serial Pj = Parallel Single Thread Multi-Thread S1 S1 Time Time Spawn P1 P1 P2 P3 P3 P2 P3 Join S2 Shared Address Space P4 S2 Process Process

Shared Memory Example Sequential Parallel begin parallel // spawn a child thread private int start_iter, end_iter, i; shared int local_iter=4, sum=0; shared double sum=0.0, a[], b[], c[]; shared lock_type mylock; start_iter = getid() * local_iter; end_iter = start_iter + local_iter; for (i=start_iter; i<end_iter; i++) a[i] = b[i] + c[i]; barrier; if (a[i] > 0) { lock(mylock); sum = sum + a[i]; unlock(mylock); } barrier; // necessary end parallel // kill the child thread Print sum; for (i=0; i<8; i++) a[i] = b[i] + c[i]; sum = 0; if (a[i] > 0) sum = sum + a[i]; Print sum; Sequential Parallel

Traditional Parallel Programming Models Shared Memory Shared Memory Message Passing

Message Passing Model In message passing, parallel tasks have their own local memories One task cannot access another task’s memory Hence, to communicate data they have to rely on explicit messages sent to each other This is similar to the abstraction of processes which do not share an address space MPI programs are the best fit with message passing programming model

Message Passing Model Single Thread Message Passing S = Serial P = Parallel S1 S1 S1 S1 S1 Time Time P1 P1 P1 P1 P1 P2 S2 S2 S2 S2 P3 P4 Process 0 Process 1 Process 2 Process 3 S2 Node 1 Node 2 Node 3 Node 4 Data transmission over the Network Process

Message Passing Example id = getpid(); local_iter = 4; start_iter = id * local_iter; end_iter = start_iter + local_iter; if (id == 0) send_msg (P1, b[4..7], c[4..7]); else recv_msg (P0, b[4..7], c[4..7]); for (i=start_iter; i<end_iter; i++) a[i] = b[i] + c[i]; local_sum = 0; if (a[i] > 0) local_sum = local_sum + a[i]; if (id == 0) { recv_msg (P1, &local_sum1); sum = local_sum + local_sum1; Print sum; } send_msg (P0, local_sum); for (i=0; i<8; i++) a[i] = b[i] + c[i]; sum = 0; if (a[i] > 0) sum = sum + a[i]; Print sum; Sequential Parallel

Shared Memory Vs. Message Passing Comparison between shared memory and message passing programming models: Aspect Shared Memory Message Passing Communication Implicit (via loads/stores) Explicit Messages Synchronization Explicit Implicit (Via Messages) Hardware Support Typically Required None Development Effort Lower Higher Tuning Effort Aspect Shared Memory Message Passing Communication Implicit (via loads/stores) Explicit Messages Synchronization Explicit Implicit (Via Messages) Hardware Support Typically Required None Development Effort Lower Higher Tuning Effort Aspect Shared Memory Message Passing Communication Implicit (via loads/stores) Explicit Messages Synchronization Explicit Implicit (Via Messages) Hardware Support Typically Required None Development Effort Lower Higher Tuning Effort Aspect Shared Memory Message Passing Communication Implicit (via loads/stores) Explicit Messages Synchronization Explicit Implicit (Via Messages) Hardware Support Typically Required None Development Effort Lower Higher Tuning Effort Aspect Shared Memory Message Passing Communication Implicit (via loads/stores) Explicit Messages Synchronization Explicit Implicit (Via Messages) Hardware Support Typically Required None Development Effort Lower Higher Tuning Effort

Discussion on Programming Models Objectives Discussion on Programming Models MapReduce Message Passing Interface (MPI) Examples of parallel processing Examples of parallel processing Traditional Models of parallel programming Parallel computer architectures Why parallelizing our programs?

SPMD and MPMD When we run multiple processes with message-passing, there are further categorizations regarding how many different programs are cooperating in parallel execution We distinguish between two models: Single Program Multiple Data (SPMD) model Multiple Programs Multiple Data (MPMP) model

SPMD In the SPMD model, there is only one program and each process uses the same executable working on different sets of data a.out Node 1 Node 2 Node 3

MPMD The MPMD model uses different programs for different processes, but the processes collaborate to solve the same problem MPMD has two styles, the master/worker and the coupled analysis a.out b.out a.out b.out c.out a.out= Structural Analysis, b.out = fluid analysis and c.out = thermal analysis Example Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 1. MPMD: Master/Slave 2. MPMD: Coupled Analysis

An Example A Sequential Program Read array a() from the input file Set is=1 and ie=6 //is = index start and ie = index end Process from a(is) to a(ie) Write array a() to the output file is ie 1 2 3 4 5 6 a Colored shapes indicate the initial values of the elements Black shapes indicate the values after they are processed a a

An Example Process 0 Process 1 Process 2 Read array a() from the input file Get my rank If rank==0 then is=1, ie=2 If rank==1 then is=3, ie=4 If rank==2 then is=5, ie=6 Process from a(is) to a(ie) Gather the results to process 0 If rank==0 then write array a() to the output file Process 1 Read array a() from the input file Get my rank If rank==0 then is=1, ie=2 If rank==1 then is=3, ie=4 If rank==2 then is=5, ie=6 Process from a(is) to a(ie) Gather the results to process 0 If rank==0 then write array a() to the output file Process 2 Read array a() from the input file Get my rank If rank==0 then is=1, ie=2 If rank==1 then is=3, ie=4 If rank==2 then is=5, ie=6 Process from a(is) to a(ie) Gather the results to process 0 If rank==0 then write array a() to the output file is ie is ie is ie 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 a a a a a a a

Concluding Remarks To summarize, keep the following 3 points in mind: The purpose of parallelization is to reduce the time spent for computation Ideally, the parallel program is p times faster than the sequential program, where p is the number of processes involved in the parallel execution, but this is not always achievable Message-passing is the tool to consolidate what parallelization has separated. It should not be regarded as the parallelization itself

Next Class Discussion on Programming Models MapReduce Message Passing Interface (MPI) Message Passing Interface (MPI) Examples of parallel processing Traditional Models of parallel programming Parallel computer architectures Programming Models- Part II Why parallelizing our programs?

Thank You!