1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Multiple Processor Systems

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University

Introduction to MIMD architectures

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.

1 Multi Threaded Architectures Sima, Fountain and Kacsuk Chapter 16 CSE462.

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Computer System Architectures Computer System Software

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Introduction MSCS 6060 – Parallel and Distributed Systems.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Parallel Computer Architecture and Interconnect 1b.1.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.

+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.

Outline Why this subject? What is High Performance Computing?

1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

Background Computer System Architectures Computer System Software.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 34 Multiprocessors (Shared Memory Architectures) Prof. Dr. M. Ashraf Chughtai.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Overview Parallel Processing Pipelining

Multiprocessor Systems

Introduction to parallel programming

Distributed Shared Memory

CS5102 High Performance Computer Systems Thread-Level Parallelism

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

CS 147 – Parallel Processing

CMSC 611: Advanced Computer Architecture

MIMD Multiple instruction, multiple data

Parallel and Multiprocessor Architectures – Shared Memory

Lecture 1: Parallel Architecture Intro

Chapter 17 Parallel Processing

CS 213: Parallel Processing Architectures

Introduction to Multiprocessors

High Performance Computing

The University of Adelaide, School of Computer Science

Database System Architectures

Lecture 17 Multiprocessors and Thread-Level Parallelism

CSL718 : Multiprocessors 13th April, 2006 Introduction

The University of Adelaide, School of Computer Science

Presentation transcript:

1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Architectural Concepts l Distributed Memory MIMD –Replicate the processor/memory pairs –Connect them via an interconnection network l Shared Memory MIMD –Replicate the processors –Replicate the memories –Connect them via an interconnection network

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Distributed Memory Machine l Access to local memory module is much faster than remote l Hardware remote accesses via –Load/Store primitive –Message passing layer l Cache memory for local memory traffic l Message –Memory-memory –Cache-cache Processor 1 Processor p Interconnection Network Memory

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Advantages of Distributed Memory l Local memory traffic less contention than in shared memory l Highly scalable l Don’t need sophisticated synchronization features like monitors, semaphores. Message passing serves dual purpose –To send the data –Provide synchronization

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Problems of Distributed Memory l Load balancing l Message passing can lead to synchronization failures, including deadlock –BlockingSend -> BlockingReceive –BlockingReceive -> BlockingSend l Intensive data copying of whole structures l Small message overheads are high

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Shared Memory Architecture l All processors have equal access to shared memory modules l Local Caches reduce –Memory traffic –Network traffic –Memory access time l IP Synchronisation –Indivisible load/store Processor 1 Processor 2 Processor p Interconnection Network Memory Module 1 Memory Module 2 Memory Module m

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Advantages of Shared Memory l No need to partition code or data –Occurs on the fly l No need to move data explicitly l Don’t need new programming languages or compilers.

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Disadvantages of Shared Memory l Synchronization is difficult l Lack of scalability –IPC becomes bottleneck l Scalability can be addressed by –High throughput, low latency network –Cache Memories Causes coherence problem –Distributed shared memory architecture

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Distributed Shared Memory l Three design choices –Non-uniform memory access (NUMA) Like Cray T3D –Cache coherent non-uniforms memory access (CC-NUMA) Convex SPP, Stanford DASH –Cache-only memory access (COMA) Like KSR-1

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Non-uniform memory access (NUMA) P0P0 M0M0 PE 0 P1P1 M1M1 PE 1 PnPn MnMn PE n Interconnection Network

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Cache coherent non-uniforms memory access (CC-NUMA) Interconnection Network P0P0 M0M0 PE 0 C0C0 P1P1 M1M1 PE 1 C1C1 PnPn MnMn PE n CnCn

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Cache-only memory access (COMA) Interconnection Network P0P0 PE 0 C0C0 P1P1 PE 1 C1C1 PnPn PE n CnCn

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Classification of MIMD Computers MIMD Computers Process-level architectures Single Address Space shared Memory Physical Shared memory (UMA) Virtual Distributed Shared Memory NUMA CC-NUMA COMA Multiple Address Space distributed Memory Thread Level architectures Single address space shared memory Physical Shared Memory (UMA) Virtual Distributed Shared Memory NUMA CC-NUMA

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Problems of Scalable Computers l Tolerate and hide the latency of remote loads –Worse if output of one computation relies on another to complete l Tolerate and hide idling due to synchronization among processors

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Tolerating Remote Loads P0P0 PE 0 Interconnection Network M0M0 rA rB Result P1P1 PE 1 M1M1 A PnPn PE n MnMn B Result:= A + B Load A rA A A Load B rB B B

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Tolerating Latency l Cache memory –Simply lowers the cost of remote access –Introduces cache coherence problem l Prefetching –Already present, so cost is low –Increases network load l Threads + fast context switching –Accept that it will take a long time and cover the overhead l These solutions don’t solve synchronization issues –Latency tolerant algorithms

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Design issues of scalable MIMD l Processor Design –Pipelining, parallel instruction issue –Atomic data access, prefetching, cache memory, message passing, etc l Interconnection network design –Scalable, high bandwidth, low latency l Memory design –Shared memory design –Cache coherence l IO Subsystem –Parallel IO