1 CSE 45432 SUNY New Paltz Chapter Nine Multiprocessors.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.
Distributed Systems CS
SE-292 High Performance Computing
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Multiple Processor Systems
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1  1998 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.
Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapters 8 & 9 (partial coverage)
BusMultis.1 Review: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor –
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1  2004 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.
MultiIntro.1 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Computer System Architectures Computer System Software
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CMPE 421 Parallel Computer Architecture Multi Processing 1.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Memory/Storage Architecture Lab Computer Architecture Multiprocessors.
.1 Intro to Multiprocessors. .2 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Lec 6 Chap. 13Multiprocessors
Outline Why this subject? What is High Performance Computing?
Parallel Computing Erik Robbins. Limits on single-processor performance  Over time, computers have become better and faster, but there are constraints.
1 Introduction ELG 6158 Digital Systems Architecture Miodrag Bolic.
Super computers Parallel Processing
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
1  2004 Morgan Kaufmann Publishers Fallacies and Pitfalls Fallacy: the rated mean time to failure of disks is 1,200,000 hours, so disks practically never.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Architecture
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel and Multiprocessor Architectures – Shared Memory
Different Architectures
Multiprocessors - Flynn’s taxonomy (1966)
Multiple Processor Systems
High Performance Computing
Chapter 4 Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
CSL718 : Multiprocessors 13th April, 2006 Introduction
Presentation transcript:

1 CSE SUNY New Paltz Chapter Nine Multiprocessors

2 CSE SUNY New Paltz Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news: its really hard to write good concurrent programs many commercial failures Shared Memory Multiprocessor or SMP - Symmetric Multiprocessor Distributed Memory Multiprocessor or Network connected MP

3 CSE SUNY New Paltz Classification Shared address space -- Distributed address space: –Which memory locations can a processor access from an instruction? –When a processor does not have access to the entire memory in the system, information is shared by message passing. Uniform memory access, UMA -- Non-uniform memory access, NUMA: –Does the delay for accessing a memory location depend on the address of that location? –UMA: Shared memory multiprocessor or symmetric multiprocessor –NUMA: Distributed memory multiprocessor Parallel Processors -- Cluster of processors: –Speed of the interconnections (bus, switch, network). –Single operating system or one OS for each processor. –User writes many programs for the processors or user writes one parallel program and a run-time system (or parallel compiler) distributes the work to the processors.

4 CSE SUNY New Paltz Parallel Programs Parallel programs need to: –Synchronize (locks, semaphores). –Share data ( shared memory or send / receive primitives). Speedup, S(n) of an n processor system is the time to execute on one processor divided by the time to execute on n processors. –Linear speedup : S(n) = k X n –Communication slows down the parallel system. –Synchronization ( needed for data dependence) also slows down the system. Amdal’s law: Parallel execution time = the sum of: –Execution time for the parallel part ÷ n –Execution time for the non-parallel part

5 CSE SUNY New Paltz Barrier Synchronization Example: sum of A(i) for i = 1, ….., 64 on 4 processors: –Each processor computers the sum of 16 numbers. –One processor computes the sum of 4 partial sums Parallel programs frequently need Barrier Synchronization: –When any processor reaches the barrier, it waits until all processors reach that barrier, before execution continues. Processor 1 Processor 2 Processor 3

6 CSE SUNY New Paltz Cache Coherence in SMP’s Different caches may contain different value for the same memory location

7 CSE SUNY New Paltz Snooping Cache Coherence Protocols Each processor monitors the activity on the bus On a read miss, all caches check to see if they have a copy of the requested block. If yes, they supply the data. On the write miss, all caches check to see of they have a copy of the requested data. If yes, they either invalidate the local copy, or update it with the new value. Can have either write back or write through policy.

8 CSE SUNY New Paltz Example: Write invalidate with write back

9 CSE SUNY New Paltz Example: Write update

10 CSE SUNY New Paltz Multiprocessors Connected by Networks May have only non-shared address spare -- that is, each processor can access only local memory and shared data is through message passing. May have a global shared address space -- that is, a processor can access any location in the entire address space. Because some memory accesses have to go through the network, the machine is a NUMA. Programming models: –Each processor has only local variables. –Each processor ahs local variables but can also access global variables shared by all processors. Many network topologies can be sued to connect the processor / memory pairs: –Rings –Mesh, Tori, K-ary n-cube –Hypercube –Multi-stage networks (cross-bars and Omega networks).

11 CSE SUNY New Paltz Multiprocessors Connected as a Ring & a Tree

12 CSE SUNY New Paltz Multiprocessors Connected as a hypercube

13 CSE SUNY New Paltz Multiprocessors Connected as a k-ary n-cube

14 CSE SUNY New Paltz Concluding Remarks Evolution vs. Revolution “More often the expense of innovation comes from being too disruptive to computer users” “Acceptance of hardware ideas requires acceptance by software people; therefore hardware people should learn about software. And if software people want good machines, they must learn more about hardware to be able to communicate with and thereby influence hardware engineers.”