Parallel and Multiprocessor Architectures – Shared Memory

Slides:

Advertisements

Similar presentations

L.N. Bhuyan Adapted from Patterson’s slides

Advertisements

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

SE-292 High Performance Computing

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

Chapter 17 Parallel Processing.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Chapter 18 Parallel Processing (Multiprocessing).

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Distributed Computer Architecture Benjamin Jordan, Kevin Cone, Jason Bradley.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

Background Computer System Architectures Computer System Software.

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

The University of Adelaide, School of Computer Science

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

These slides are based on the book:

COMP 740: Computer Architecture and Implementation

Multiprocessor System Distributed System

Overview Parallel Processing Pipelining

Parallel Architecture

Computer Organization and Architecture

Multiprocessor Systems

Introduction to parallel programming

Distributed Processors

Definition of Distributed System

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

CS 147 – Parallel Processing

Grid Computing.

12.4 Memory Organization in Multiprocessor Systems

The University of Adelaide, School of Computer Science

CMSC 611: Advanced Computer Architecture

Example Cache Coherence Problem

MIMD Multiple instruction, multiple data

The University of Adelaide, School of Computer Science

Shared Memory Multiprocessors

Multiprocessor Introduction and Characteristics of Multiprocessor

Chapter 5 Multiprocessor and Thread-Level Parallelism

Chapter 17 Parallel Processing

Multiprocessors - Flynn’s taxonomy (1966)

Multiple Processor Systems

Multiple Processor Systems

Introduction to Multiprocessors

Lecture 24: Memory, VM, Multiproc

/ Computer Architecture and Design

Multiple Processor and Distributed Systems

High Performance Computing

The University of Adelaide, School of Computer Science

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Database System Architectures

Coherent caches Adapted from a lecture by Ian Watson, University of Machester.

Lecture 17 Multiprocessors and Thread-Level Parallelism

CSL718 : Multiprocessors 13th April, 2006 Introduction

Parallel and Multiprocessor Architectures

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

Parallel and Multiprocessor Architectures – Shared Memory Recall: Microprocessors are classified by how memory is organized Tightly-coupled multiprocessor systems use the same memory. They are also referred to as shared memory multiprocessors. The processors do not necessarily have to share the same block of physical memory: Each processor can have its own memory, but it must share it with the other processors. Configurations such as these are called distributed shared memory multiprocessors. 1 1

Parallel and Multiprocessor Architectures – Shared Memory Also, each processor can have its local cache memory used with a single global memory 2 2

Parallel and Multiprocessor Architectures – Shared Memory A type of distributed shared memory system is called a Shared Virtual Memory System Each processor contains a cache The system has no primary memory Data is accessed through cache directories maintained in each processor Processors are connect via two unidirectional rings The level-one ring can connect 8 to 32 processors The level-two ring can connect up to 34 level-one rings Example: If Processor A referenced data in location X, Processor B will place Address X’s data on the ring with a destination address of Processor A Processor B Data at Address X Processor A 3 3

Parallel and Multiprocessor Architectures – Shared Memory Shared memory MIMD machines can ALSO be divided into two categories based upon how they access memory or synchronize their memory operations. Uniform access approach Non uniform access approach In Uniform Memory Access (UMA) systems, all memory accesses take the same amount of time. A switched interconnection UMA system becomes very expensive as the number of processors grow Bus-based UMA systems saturate when the bandwidth of the bus becomes insufficient Multistage UMA run into wiring constraints and latency issues as the number of processors grow Symmetric multiprocessor UMA must be fast enough to support multiple concurrent accesses to memory, or it will slow down the whole system. The interconnection network of a UMA system limits the number of processors – scalability is limited. 4 4

Parallel and Multiprocessor Architectures – Shared Memory The other category of MIMD machines are the Nonuniform Memory Access (NUMA) systems. NUMA systems can overcome the inherent UMA problems by providing each processor with its own memory. Although the memory is distributed, NUMA processors see the memory as one contiguous addressable space. Thus, a processor can access its own memory much more quickly than it can access memory that is elsewhere. Not only does each processor have its own memory, it also has its own cache, a configuration that can lead to cache coherence problems. Cache coherence problems arise when main memory data is changed and the cached image is not. (We say that the cached value is stale.) To combat this problem, some NUMA machines are equipped with snoopy cache controllers that monitor all caches on the systems. These systems are called cache coherent NUMA (CC-NUMA) architectures. A simpler approach is to ask the processor having the stale value to either void the stale cached value or to update it with the new value. 5 5

Parallel and Multiprocessor Architectures – Shared Memory When a processor’s cached value is updated concurrently with the update to memory, we say that the system uses a write-through cache update protocol. If the write-through with update protocol is used, a message containing the update is broadcast to all processors so that they may update their caches. If the write-through with invalidate protocol is used, a broadcast asks all processors to invalidate the stale cached value. Write-invalidate uses less bandwidth because it uses the network only the first time the data is updated, but retrieval of the fresh data takes longer. Write-update creates more message traffic, but all caches are kept current. Another approach is the write-back protocol that delays an update to main memory until the modified cache block is replaced and written to memory. At replacement time, the processor writing the cached value must obtain exclusive rights to the data. When rights are granted, all other cached copies are invalidated. 6 6

Parallel and Multiprocessor Architectures – Distributed Computing Distributed computing is another form of multiprocessing. However, the term distributed computing means different things to different people. In a sense, all multiprocessor systems are distributed systems because the processing load is distributed among processors that work collaboratively. What is really meant by distributed system is, the processing units are very loosely-coupled. Each processor is independent with its own memory and cache, and the processors communicate via a high speed network. Another name for this is Cluster Computing Processing units connected via a bus are considered tightly-coupled. Grid Computing is an example of distributed computing – make use of heterogeneous CPUs and storage devices located in different domains to solve computation problems too large for any single supercomputer The difference between Grid Computing and Cluster Computing is that Grid Computing can use resources in different domains versus only the same domain. Global Computing is grid computing with resources provided by volunteers. 7 7

Parallel and Multiprocessor Architectures – Distributed Systems For general-use computing, transparency is important – details about the distributed nature of the system should be hidden. Using remote system resources should require no more effort than a local system. An example of this type of distributed system is called Ubiquitous computing systems (or Pervasive computing systems). These systems are totally embedded in the environment – simple to use – completely connected – mobile – invisible and in the background. Remote procedure calls (RPCs) enable this transparency. RPCs use resources on remote machines by invoking procedures that reside and are executed on the remote machines. RPCs are employed by numerous vendors of distributed computing architectures including the Common Object Request Broker Architecture (CORBA) and Java’s Remote Method Invocation (RMI). 8 8

Parallel and Multiprocessor Architectures – Distributed Systems Cloud computing is distributed computing to the extreme. It provides services over the Internet through a collection of loosely-coupled systems. In theory, the service consumer has no awareness of the hardware, or even its location. Your services and data may even be located on the same physical system as that of your business competitor. The hardware might even be located in another country. Security concerns are a major inhibiting factor for cloud computing. 9 9