(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

SE-292 High Performance Computing
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Computer Architecture Computational Models Ola Flygt V ä xj ö University
Distributed Shared Memory Systems and Programming
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Advanced / Other Programming Models Sathish Vadhiyar.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Institute for Software Science – University of ViennaP.Brezany Parallel and Distributed Systems Peter Brezany Institute for Software Science University.
Concurrent Aggregates (CA) Andrew A. Chien and William J. Dally Presented by: John Lynn and Ryan Wu.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Parallel Computing Presented by Justin Reschke
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Introduction to Operating Systems Concepts
Distributed Shared Memory
CS5102 High Performance Computer Systems Thread-Level Parallelism
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Background and Motivation
Chapter 2: Operating-System Structures
Lecture 25: Multiprocessors
EE 4xx: Computer Architecture and Performance Programming
The University of Adelaide, School of Computer Science
Lecture 24: Multiprocessors
An Orchestration Language for Parallel Objects
Chapter 2: Operating-System Structures
The University of Adelaide, School of Computer Science
Presentation transcript:

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign Department of Computer Science

Parallel Machines: an abstract introduction Our main focus will be on three kinds of machines –Bus-based shared memory machines –Scalable shared memory machines Cache coherent Hardware support for remote memory access –Distributed memory machines

Distributed memory m/cs: debate Interconnection Network PE0 Mem0 cache Pep-1 Memp-1 cache PE1 Mem1 cache Should this machine support a shared address space? If not : coordination by “passing messages” If so: how and whether to keep caches “ coherent”? This debate is also tied to the debate over programming models:

Writing parallel programs Programming model –How should a programmer view the parallel machine? –Sequential programming: von Neumann model Parallel programming models: –Shared memory (Shared address space) model –Message passing model –Shared Objects model Common to all these models: –In all these models, you have multiple independent entities communicating, synchronizing and coordinating with each other via specific mechanisms provided by the model Special-purpose models: –A common case: data-parallel (loop-parallel) models –Other “domain-specific” models

Shared Address space model Also called shared memory model sometimes: –considered a misnomer by some: shared memory is an arch. Concept Independent entities are called threads (or processes) –All threads use the same common address space –When thread i refers to an address A, it is the same location as when thread j refers to address A. Advantages: –Natural extension of sequential programming model Some people disagree even about this –Relatively easy to get “first parallel version” of an existing sequential code

Shared Address space model: Issues: –Need hardware support for cache coherence and consistency: But that’s not the concern when we are discussing efficacy of the prog model –Data being read by one may be being modified by another Need ways of synchronizing access E.g. Producer-consumer relationship between threads –Producer is to store the result in shared variable X –When can the consumer thread read it? –Another example: inconsistent modifications: Suppose two processes are both trying to add 5 to x. –In reality, it is not one instruction, but 3: Now, the 6 instructions (3 from each thread) –may interleave in many possible ways –leading to wrong behavior x := x-5 ld r1,x; add r1,r1,5; st r1,x

SAS model: Locks and Barriers Solution: Locks –A lock is a variable –You can: create a lock, “lock” a lock, and “unlock” a lock –The implementation guarantees that: only one thread can “get” or “lock” a lock at a time Using locks: –Protect vulnerable shared data using a lock –associate a lock with such a variable Mentally (there is no construct or call to do the association) –Before changing the variable, lock its associated variable unlock it as soon as you finished using it –Remember that this is only a convention Nothing prevents a thread from inadvertently changing a variable that is protected by lock in another part of the code: Analogy: locking a room with a “post-it” on the door

Matrix multiplication: Why people like SAS model: for (i=0; i<M; i++) for (j=0; j<N; j++) for (k=0; k<L; k++) C[i][j] += A[i][k]*B[k][j]; In a shared memory style, this program is trivial to parallelize Just have each processor deal with a different range of I (or J?) (or Both?)

SAS matrix multiply Each thread know its “serial number”: –myPe() size= M/numPEs( ); myStart = myPE( ) for (i=myStart; i<myStart+size; i++) for (j=0; j<N; j++) for (k=0; k<L; k++) C[i][j] += A[i][k]*B[k][j];

Message passing Parallel entities are processes –With their own address space Assume that processors have direct access to only their memory Each processor typically executes the same executable, but may be running different part of the program at a time Coordination : –via sending and receiving “messages”: bytes of data

Message passing basics: Basic calls: send and recv send(int proc, int tag, int size, char *buf); recv(int proc, int tag, int size, char * buf); Recv may return the actual number of bytes received in some systems tag and proc may be wildcarded in a recv: –recv(ANY, ANY, 1000, &buf); broadcast: Other global operations (reductions)

Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel Mapping: –Which processor does each task Scheduling (sequencing) –On each processor Machine dependent expression –Express the above decisions for the particular parallel machine

Spectrum of parallel Languages Specialization LevelLevel MPI/SAS Parallelizing fortran compiler Machine dependent expression Scheduling (sequencing) Mapping Decomposition What is automated Charm++

Shared objects model: Basic philosophy: –Let the programmer decide what to do in parallel –Let the system handle the rest: Which processor executes what, and when With some override control to the programmer, when needed Basic model: –The program is set of communicating objects –Objects only know about other objects (not processors) –System maps objects to processors And may remap the objects for load balancing etc. dynamically Shared objects, not shared memory –So, in some ways, in between “shared nothing” message passing, and “shared everything” of SAS –More disciplined sharing –Additional information sharing mechanisms

Charm++ Data Driven Objects: called chares Asynchronous method invocation Prioritized scheduling Object Arrays Object Groups: –global object with a “representative” on each PE Information sharing abstractions –readonly data –accumulators –distributed tables

Data Driven Execution Scheduler Message Q Objects

Object Arrays A collection of chares, –with a single global name for the collection, and –each member addressed by an index –Mapping of element objects to processors handled by the system A[0]A[1]A[2]A[3]A[..] A[3] A[0] User’s view System view

Object Groups A group of objects (chares) –with exactly one representative on each processor –A single Id for the group as a whole –invoke methods in a branch (asynchronously), all branches (broadcast), or in the local branch

Information sharing abstractions Observation: –Information is shared in several specific modes in parallel programs Other models support only a limited sets of modes: –Shared memory: everything is shared: sledgehammer approach –Message passing: messages are the only method Charm++: identifies and supports several modes –Readonly / writeonce –Tables (hash tables) –accumulators –Monotonic variables

Comparing Programming Models What are the advantages and disadvantages of the models? –even at this simple/abstract level of introduction?