CALTECH cs184c Spring2001 -- DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 1: April 3, 2001 Overview and Message Passing.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Threads, SMP, and Microkernels
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Distributed Systems CS
User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Akbar.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Computer Systems/Operating Systems - Class 8
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
Reference: Message Passing Fundamentals.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day1:
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
Chapter 13 Embedded Systems
Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
COP 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
CBSSS 2002: DeHon Architecture as Interface André DeHon Friday, June 21, 2002.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day3:
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 15: May 4, 2005 Message Passing.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 8: April 26, 2001 Simultaneous Multi-Threading (SMT)
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Processes and Process Control 1. Processes and Process Control 2. Definitions of a Process 3. Systems state vs. Process State 4. A 2 State Process Model.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Remote Procedure Call Andy Wang Operating Systems COP 4610 / CGS 5765.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 7: April 24, 2001 Threaded Abstract Machine (TAM) Simultaneous.
Mark Stanovich Operating Systems COP Primitives to Build Distributed Applications send and receive Used to synchronize cooperating processes running.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 11: May10, 2001 Data Parallel (SIMD, SPMD, Vector)
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 14: May 7, 2003 Fast Messaging.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 9: May 3, 2001 Distributed Shared Memory.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
Agenda  Quick Review  Finish Introduction  Java Threads.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Spring 2003CSE P5481 WaveScalar and the WaveCache Steven Swanson Ken Michelson Mark Oskin Tom Anderson Susan Eggers University of Washington.
CS 162 Discussion Section Week 2. Who am I? Prashanth Mohan Office Hours: 11-12pm Tu W at.
Module 12: I/O Systems I/O hardware Application I/O Interface
CS5102 High Performance Computer Systems Thread-Level Parallelism
CS184c: Computer Architecture [Parallel and Multithreaded]
ICS 143 Principles of Operating Systems
Sarah Diesburg Operating Systems COP 4610
CS703 - Advanced Operating Systems
Prof. Leonardo Mostarda University of Camerino
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 1: April 3, 2001 Overview and Message Passing

CALTECH cs184c Spring DeHon Today This Class Why/Overview Message Passing

CALTECH cs184c Spring DeHon CS184 Sequence A - structure and organization –raw components, building blocks –design space B - single threaded architecture –emphasis on abstractions and optimizations including quantification C - multithreaded architecture

CALTECH cs184c Spring DeHon “Architecture” “attributes of a system as seen by the programmer” “conceptual structure and functional behavior” Defines the visible interface between the hardware and software Defines the semantics of the program (machine code) CS184b

CALTECH cs184c Spring DeHon Conventional, Single- Threaded Abstraction Single, large, flat memory sequential, control-flow execution instruction-by-instruction sequential execution atomic instructions single-thread “owns” entire machine byte addressability unbounded memory, call depth CS184b

CALTECH cs184c Spring DeHon This Term Different models of computation –different microarchitectures Big Difference: Parallelism –previously model was sequential Mostly: –Multiple Program Counters –threads of control

CALTECH cs184c Spring DeHon Architecture Instruction Taxonomy CS184a

CALTECH cs184c Spring DeHon Why? Why do we need a different model? –Different architecture?

CALTECH cs184c Spring DeHon Why? Density: –Superscalars scaling super-linear with increasing instructions/cycle –cost from maintaining sequential model dependence analysis renaming/reordering single memory/RF access –VLIW lack of model/scalability problem Maybe there’s a better way?

CALTECH cs184c Spring DeHon Consider Two network data ports –states: idle, first-datum, receiving, closing –data arrival uncorrelated between ports CS184a

CALTECH cs184c Spring DeHon Instruction Control If FSMs advance orthogonally –(really independent control) –context depth => product of states for full partition –I.e. w/ single controller (PC) must create product FSM which may lead to state explosion –N FSMs, with S states => S N product states –This example: 4 states, 2 FSMs => 16 state composite FSM CS184a

CALTECH cs184c Spring DeHon Why? Scalablity – compose more capable machine from building blocks –compose from modular building blocks multiple chips

CALTECH cs184c Spring DeHon Why? Expose/exploit parallelism better –saw non-local parallelism when looking at IPC –saw need for large memory to exploit

CALTECH cs184c Spring DeHon Models? Message Passing (week 1) Dataflow (week 2) Shared Memory (week 3) Data Parallel (week 4) Multithreaded (week 5) Interface Special and Heterogeneous functional units (week 6)

CALTECH cs184c Spring DeHon Additional Key Issues How Interconnect? (week 7-8) Cope with defects and Faults? (week 9)

CALTECH cs184c Spring DeHon Message Passing

CALTECH cs184c Spring DeHon Message Passing Simple extension to Models –Compute Model –Programming Model –Architecture Low-level

CALTECH cs184c Spring DeHon Message Passing Model Collection of sequential processes Processes may communicate with each other (messages) –send –receive Each process runs sequentially –has own address space Abstraction is each process gets own processor

CALTECH cs184c Spring DeHon Programming for MP Have a sequential language –C, C++, Fortran, lisp… Add primitives (system calls) –send –receive –spawn

CALTECH cs184c Spring DeHon Architecture for MP Sequential Architecture for processing node –add network interfaces –process have own address space Add network connecting …minimally sufficient...

CALTECH cs184c Spring DeHon MP Architecture Virtualization Processes virtualize nodes –size independent/scalable Virtual connections between processes –placement independent communication

CALTECH cs184c Spring DeHon MP Example and Performance Issues

CALTECH cs184c Spring DeHon N-Body Problem Compute pairwise gravitational forces Integrate positions

CALTECH cs184c Spring DeHon Coding // params position, mass…. F=0 For I = 1 to N –send my params to p[body[I]] –get params from p[body[I]] –F+=force(my params, params) Update pos, velocity Repeat

CALTECH cs184c Spring DeHon Performance Body Work ~= cN Cycle work ~= cN 2 Ideal Np processors: cN 2 /N p

CALTECH cs184c Spring DeHon Performance Sequential Body work: –read N values –compute N force updates –compute pos/vel from F and params c=t(read value) + t(compute force)

CALTECH cs184c Spring DeHon Performance MP Body work: –send N messages –receive N messages –compute N force updates –compute pos/vel from F and params c=t(send message) + t(receive message) + t(compute force)

CALTECH cs184c Spring DeHon Send/receive t(receive) –wait on message delivery –swap to kernel –copy data –return to process t(send) –similar t(send), t(receive) >> t(read value)

CALTECH cs184c Spring DeHon Sequential vs. MP T seq = c seq N 2 T mp =c mp N 2 /N p Speedup = T seq /T mp = c seq  N p /c mp Assuming no waiting: –c seq /c mp ~= t(read value) / (t(send) + t(rcv))

CALTECH cs184c Spring DeHon Waiting? Shared bus interconnect: –wait O(N) time for N sends (receives) across the machine Non-blocking interconnect: –wait L(net) time after message send to receive –if insufficient parallelism latency dominate performance

CALTECH cs184c Spring DeHon Dertouzous Latency Bound Speedup Upper Bound –processes / Latency

CALTECH cs184c Spring DeHon Waiting: data availability Also wait for data to be sent

CALTECH cs184c Spring DeHon Coding/Waiting For I = 1 to N –send my params to p[body[I]] –get params from p[body[I]] –F+=force(my params, params) How long processsor I wait for first datum? –Parallelism profile?

CALTECH cs184c Spring DeHon More Parallelism For I = 1 to N –send my params to p[body[I]] For I = 1 to N –get params from p[body[I]] –F+=force(my params, params)

CALTECH cs184c Spring DeHon Queuing? For I = 1 to N –send my params to p[body[I]] –get params from p[body[I]] –F+=force(my params, params) No queuing? Queuing?

CALTECH cs184c Spring DeHon Dispatching Multiple processes on node Who to run? –Can a receive block waiting?

CALTECH cs184c Spring DeHon Dispatching Abstraction is each process gets own processor If receive blocks (holds processor) –may prevent another process from running upon which it depends Consider 2-body problem on 1 node

CALTECH cs184c Spring DeHon Seitz Coding [see reading]

CALTECH cs184c Spring DeHon MP Issues

CALTECH cs184c Spring DeHon Expensive Communication Process to process communication goes through operating system –system call, process switch –exit processor, network, enter processor –system call, processes switch Milliseconds? –Thousands of cycles...

CALTECH cs184c Spring DeHon Why OS involved? Protection/Isolation –can this process send/receive with this other process? Translation –where does this message need to go? Scheduling –who can/should run now?

CALTECH cs184c Spring DeHon Issues Process Placement –locality –load balancing Cost for excessive parallelism –E.g. N-body on N p < N processor ? Message hygiene –ordering, single delivery, buffering Deadlock –user introduce, system introduce

CALTECH cs184c Spring DeHon Low-Level Model Places burden on user [too much] –decompose problem explicitly sequential chunk size not abstract scale weakness in architecture –guarantee correctness in face of non- determinism –placement/load-balancing in some systems Gives considerable explicit control

CALTECH cs184c Spring DeHon Low-Level Primitives Has the necessary primitives for multiprocessor cooperation Maybe an appropriate compiler target? –Architecture model, but not programming/compute model?

CALTECH cs184c Spring DeHon Announcements Note: CS25 next Monday/Tuesday –Seitz speaking on Tuesday –Dally speaking on Monday –(also Mead) –[…even DeHon… :-)] Changing schedule (…already…) –Network Interface bumped up to next Mon. von Eicken et. Al., Active Messages Henry and Joerg, Tightly couple P-NI

CALTECH cs184c Spring DeHon Big Ideas Value of Architectural Abstraction Sequential abstraction –limits implementation freedom –requires large cost to support semantic mismatch between model and execution Parallel models expose more opportunities

CALTECH cs184c Spring DeHon Big Ideas MP has minimal primitives –appropriate low-level model –too raw/primitive for user model Communication essential component –can be expensive –doing well is necessary to get good performance (come out ahead) –watch OS cost...