Chapter 2: Parallel Programming Models 2005-2008 Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval.

Slides:

Advertisements

Similar presentations

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.

Advertisements

Computer Abstractions and Technology

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

1 Last Time: OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.

1 Processes and Pipes COS 217 Professor Jennifer Rexford.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

3.5 Interprocess Communication

Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

Distributed Systems CS Programming Models- Part I Lecture 16, Oct 31, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.

CS 470/570:Introduction to Parallel and Distributed Computing.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Twelfth Edition.

Threads, Thread management & Resource Management.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Processes & Threads Emery Berger and Mark Corner University.

Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

Computer Organization David Monismith CS345 Notes to help with the in class assignment.

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

System calls for Process management

Distributed Systems CS /640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andVinay.

1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.

Distributed Systems CS Programming Models Gregory Kesden Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud.

Fundamentals of Parallel Computer Architecture - Chapter 51 Chapter 5 Parallel Programming for Linked Data Structures Yan Solihin.

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

Fundamentals of Parallel Computer Architecture - Chapter 41 Chapter 4 Issues in Shared Memory Programming Yan Solihin Copyright notice:

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

System calls for Process management Process creation, termination, waiting.

Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.

Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.

Chapter 4: Threads.

Distributed Shared Memory

CS5102 High Performance Computer Systems Thread-Level Parallelism

Definition of Distributed System

Chapter 3 – Process Concepts

CMSC 611: Advanced Computer Architecture

Distributed Systems CS

Chapter 4: Threads.

Message Passing Models

Distributed Systems CS

Distributed Systems CS

Tutorial: The Programming Interface

Multithreaded Programming

Operating Systems (CS 340 D)

Prof. Leonardo Mostarda University of Camerino

Chapter 4: Threads & Concurrency

CS510 Operating System Foundations

Chapter 4: Threads.

Programming Parallel Computers

Presentation transcript:

Chapter 2: Parallel Programming Models Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photocopying, recording, or otherwise) without the prior written permission of the author. An exception is granted for academic lectures at universities and colleges, provided that the following text is included in such copy: “Source: Yan Solihin, Fundamentals of Parallel Computer Architecture, 2008”.

Fundamentals of Parallel Computer Architecture - Chapter 2 2 Programming Models  What is programming model? An abstraction provided by the hardware to programmers Determines how easy/difficult for programmers to express their algorithms into computation tasks that the hardware understands  Uniprocessor programming model Based on program + data Bundled in Instruction Set Architecture (ISA) Highly successful in hiding hardware from programmers  Multiprocessor programming model Much debate, still searching for the right one… Most popular: shared memory and message passing  Shared Memory / Shared Address Space Each memory location visible to all processors  Message Passing Each memory location visible to 1 processor

Fundamentals of Parallel Computer Architecture - Chapter 2 3 Thread/process – Uniproc analogy if (fork() == 0) printf(“I am the child process, my id is %d”, getpid()); else printf(“I am the parent process, my id is %d”, getpid()); Process: share nothing void sayhello() { printf(“I am child thread, my id is %d”, getpid()); } printf(“I am the parent thread, my id is %d”, getpid()); clone(&sayhello,,,()) Thread: share everything -heavyweight => high thread creation overhead -The processes share nothing => explicit communication using socket, file, or messages + lightweight => small thread creation overhead + The processes share addr space => implicit communication

Fundamentals of Parallel Computer Architecture - Chapter 2 4 Thread communication analogy int a, b, signal; … void dosum( ) { while (signal == 0) {}; // wait until instructed to work printf(“child thread> sum is %d”, a + b); signal = 0; // my work is done } void main() { a = 5, b = 3; signal = 0; clone(&dosum,…) // spawn child thread signal = 1; // tell child to work while (signal == 1) {} // wait until child done printf(“all done, exiting\n”); } Shared memory in multiproc provides similar memory sharing abstraction

Fundamentals of Parallel Computer Architecture - Chapter 2 5 Message Passing Example Int a, b; … void dosum() { recvMsg(mainID, &a, &b); printf(“child process> sum is %d”, a + b); } Void main() { if (fork() == 0) // I am the child process dosum(); else { // I am the parent process a = 5, b = 3; sendMsg(childID, a, b); wait(childID); printf(“all done, exiting\n”); } Differences with shared memory: Explicit communication Message send and receive provide automatic synchronization

Fundamentals of Parallel Computer Architecture - Chapter 2 6 Quantitative Comparison AspectsShared MemoryMessage Passing CommunicationImplicit (via ld/st)Explicit messages SynchronizationExplicitImplicit (via msgs) Hardware supportTypically requiredNone Development effort LowerHigher Tuning effortHigherLower

Fundamentals of Parallel Computer Architecture - Chapter 2 7 Development vs. Tuning Effort  Easier to develop shared memory programs Transparent data layout Transparent communication between processors Code structure little changed Parallelizing compiler, directive-driven compiler help  Harder to tune shared memory programs for scalability Data layout must be tuned Communication pattern must be tuned Machine topology matters for performance

Fundamentals of Parallel Computer Architecture - Chapter 2 8 Prog Model vs. Architecture Shared Cache/Memory Hardware Shared Mem PM Msg Passing PM Distributed Memory Hardware SVM layer Msg Passing PM Shared Memory Hardware Shared Mem PM Distributed Memory Hardware Msg Passing PM Was: Now: Shared Mem PM - Msg passing programs benefit from shared memory architecture - Sending a message achieved by passing a pointer to msg buffer - Shared mem programs need software virtual memory (SVM) layer on distributed memory computers

Fundamentals of Parallel Computer Architecture - Chapter 2 9 More Shared Memory Example for (i=0; i<8; i++) a[i] = b[i] + c[i]; sum = 0; for (i=0; i<8; i++) if (a[i] > 0) sum = sum + a[i]; Print sum; begin parallel // spawn a child thread private int start_iter, end_iter, i; shared int local_iter=4, sum=0; shared double sum=0.0, a[], b[], c[]; shared lock_type mylock; start_iter = getid() * local_iter; end_iter = start_iter + local_iter; for (i=start_iter; i<end_iter; i++) a[i] = b[i] + c[i]; barrier; for (i=start_iter; i<end_iter; i++) if (a[i] > 0) { lock(mylock); sum = sum + a[i]; unlock(mylock); } barrier; // necessary end parallel // kill the child thread Print sum; + Communication directly through memory. + Requires less code modification - Requires privatization prior to parallel execution

Fundamentals of Parallel Computer Architecture - Chapter 2 10 More Message Passing Example for (i=0; i<8; i++) a[i] = b[i] + c[i]; sum = 0; for (i=0; i<8; i++) if (a[i] > 0) sum = sum + a[i]; Print sum; id = getpid(); local_iter = 4; start_iter = id * local_iter; end_iter = start_iter + local_iter; if (id == 0) send_msg (P1, b[4..7], c[4..7]); else recv_msg (P0, b[4..7], c[4..7]); for (i=start_iter; i<end_iter; i++) a[i] = b[i] + c[i]; local_sum = 0; for (i=start_iter; i<end_iter; i++) if (a[i] > 0) local_sum = local_sum + a[i]; if (id == 0) { recv_msg (P1, &local_sum1); sum = local_sum + local_sum1; Print sum; } else send_msg (P0, local_sum); + Communication only through messages - Message sending and receiving overhead - Requires algo and program modifications

Fundamentals of Parallel Computer Architecture - Chapter 2 11 Concluding Remarks + Can easily be automated (parallelizing compiler, OpenMP) + Shared vars are not communicated, - but must be guarded by synchronizations - How to provide shared memory? Complex hardware - Synchronization overhead grows fast with more processors +/- Difficult to debug, not intuitive for users