Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 7:
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
1 Burroughs B5500 multiprocessor. These machines were designed to support HLLs, such as Algol. They used a stack architecture, but part of the stack was.
Lecture 18: Multiprocessors
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Chapter 6 Multiprocessors and Thread-Level Parallelism 吳俊興 高雄大學資訊工程學系 December 2004 EEF011 Computer Architecture 計算機結構.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
MultiIntro.1 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
CS 470/570:Introduction to Parallel and Distributed Computing.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Parallel Architectures
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
.1 Intro to Multiprocessors. .2 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Multiprocessor Systems. 2 Old CW: Power is free, Transistors expensive New CW: “Power wall” Power expensive, Xtors free (Can put more on chip than can.
Outline Why this subject? What is High Performance Computing?
Embedded Computer Architecture 5SAI0 Multi-Processor Systems
Lecture 3: Computer Architectures
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
CSE 586 Computer Architecture Lecture 8
CS 147 – Parallel Processing
Multi-Processing in High Performance Computer Architecture:
Embedded Computer Architecture 5SAI0 Multi-Processor Systems
Chapter 17 Parallel Processing
Multiprocessors - Flynn’s taxonomy (1966)
CS 213: Parallel Processing Architectures
Introduction to Multiprocessors
Parallel Processing Architectures
Lecture 24: Memory, VM, Multiproc
Chapter 4 Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
CSL718 : Multiprocessors 13th April, 2006 Introduction
Presentation transcript:

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although ILP is exploited Single Program Counter -> Single Instruction stream The data is not “streaming” Single Instruction stream, Multiple Data stream (SIMD) –Popular for some applications like image processing –One can construe vector processors to be of the SIMD type. –MMX extensions to ISA reflect the SIMD philosophy Also apparent in “multimedia” processors (Equator Map-1000) –“Data Parallel” Programming paradigm

Multiprocessors CSE 471 Aut 012 Flynn’s Taxonomy (c’ed) Multiple Instruction stream, Single Data stream (MISD) –Don’t know of any Multiple Instruction stream, Multiple Data stream (MIMD) –The most general –Covers: Shared-memory multiprocessors Message passing multicomputers (including networks of workstations cooperating on the same problem)

Multiprocessors CSE 471 Aut 013 Shared-memory Multiprocessors Shared-Memory = Single shared-address space (extension of uniprocessor; communication via Load/Store) Uniform Memory Access: UMA –Today, almost uniquely in shared-bus systems –The basis for SMP’s (Symmetric MultiProcessing) –Cache coherence enforced by “snoopy” protocols –Form the basis for clusters (but in clusters access to memory of other clusters is not UMA)

Multiprocessors CSE 471 Aut 014 Shared-memory Multiprocessors (c’ed) Non-uniform memory access: NUMA –NUMA-CC: cache coherent (directory-based protocols or SCI) –NUMA without cache coherence (enforced by software) –COMA: Cache Memory Only Architecture –Clusters Distributed Shared Memory: DSM –Most often network of workstations. –The shared address space is the “virtual address space” –O.S. enforces coherence on a page per page basis

Multiprocessors CSE 471 Aut 015 Message-passing Systems Processors communicate by messages –Primitives are of the form “send”, “receive” –The user (programmer) has to insert the messages –Message passing libraries (MPI, OpenMP etc.) Communication can be: –Synchronous: The sender must wait for an ack from the receiver (e.g, in RPC) –Asynchronous: The sender does not wait for a reply to continue

Multiprocessors CSE 471 Aut 016 Shared-memory vs. Message-passing An old debate that is not that much important any longer Many systems are built to support a mixture of both paradigms –“send, receive” can be supported by O.S. in shared-memory systems –“load/store” in virtual address space can be used in a message- passing system (the message passing library can use “small” messages to that effect, e.g. passing a pointer to a memory area in another computer) –Does a network of workstations with a page being the unit of coherence follow the shared-memory paradigm or the message passing paradigm?

Multiprocessors CSE 471 Aut 017 The Pros and Cons Shared-memory pros –Ease of programming (SPMD: Single Program Multiple Data paradigm) –Good for communication of small items –Less overhead of O.S. –Hardware-based cache coherence Message-passing pros –Simpler hardware (more scalable) –Explicit communication (both good and bad; some programming languages have primitives for that), easier for long messages –Use of message passing libraries

Multiprocessors CSE 471 Aut 018 Caveat about Parallel Processing Multiprocessors are used to: –Speedup computations –Solve larger problems Speedup –Time to execute on 1 processor / Time to execute on N processors Speedup is limited by the communication/computation ratio and synchronization Efficiency –Speedup / Number of processors

Multiprocessors CSE 471 Aut 019 Amdahl’s Law for Parallel Processing Recall Amdahl’s law –If x% of your program is sequential, speedup is bounded by 1/x At best linear speedup (if no sequential section) What about superlinear speedup? –Theoretically impossible –“Occurs” because adding a processor might mean adding more overall memory and caching (e.g., fewer page faults!) –Have to be careful about the x% of sequentiality. Might become lower if the data set increases. Speedup and Efficiency should have the number of processors and the size of the input set as parameters

Multiprocessors CSE 471 Aut 0110 SMP (Symmetric MultiProcessors aka Multis) Single shared-bus Systems Proc. Caches Shared-bus I/O adapter Interleaved Memory

Multiprocessors CSE 471 Aut 0111 Multiprocessor with Centralized Switch Proc Mem Interconnection network Proc Cache

Multiprocessors CSE 471 Aut 0112 Multiprocessor with Decentralized Switches Proc Mem Switch Proc Cache