Computer Engineering 2nd Semester

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.
MESI cache coherence protocol
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Cache Optimization Summary
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Spring EE 437 Lillevik 437s06-l21 University of Portland School of Engineering Advanced Computer Architecture Lecture 21 MSP shared cached MSI protocol.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Lecture 13: Multiprocessors Kai Bu
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Performance of Snooping Protocols Kay Jr-Hui Jeng.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13: Multiprocessors Kai Bu
Parallel Architecture
תרגול מס' 5: MESI Protocol
CSC 4250 Computer Architectures
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
Multiprocessor Cache Coherency
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Parallel and Multiprocessor Architectures – Shared Memory
Cache Coherence Protocols:
Cache Coherence Protocols:
Cache Coherence (controllers snoop on bus transactions)
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Multiprocessors - Flynn’s taxonomy (1966)
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
11 – Snooping Cache and Directory Based Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
/ Computer Architecture and Design
Lecture 9: Directory-Based Examples
High Performance Computing
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
CSL718 : Multiprocessors 13th April, 2006 Introduction
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Multiprocessors and Multi-computers
Presentation transcript:

Computer Engineering 2nd Semester Rabie A. Ramadan rabieramadan@yahoo.com

Shared Memory Architecture

Shared Memory Architecture Shared Memory System performance degradation due to contention, and coherence problems. Coherence Problem Multiple Copies

CLASSIFICATION OF SHARED MEMORY SYSTEMS Simplest model One memory storage Accessed by two processors Using two ports

Uniform Memory Access All processors have the same opportunity Sun Starfire servers, HP V series, and Compaq AlphaServer GS. Single Address Space

Non Uniform Memory Access (NUMA) The access to the memory module depends on its distance from the memory.

Cache-Only Memory Architecture (COMA) The shared memory consists of cache memory. D is the remote directory to help in remote cache access

BUS-BASED SYMMETRIC MULTIPROCESSORS One bus to connect the memories and the processors. Bus contention problem Try to minimize accessing the memory Use the cache Processors usually execute less than an instruction per cycle Superscalar processors executes many instructions per cycle

Performance measure The effective bandwidth = B*I fetches per second Miss rate = V(1-h) For N processors, the miss rate = V(1-h)N Bus saturation case when

Performance measure The maximum number of processors with cache memories that the bus can support is given by the relation:

Group Activity Suppose a shared memory system is constructed from processors that can execute V = 107 instructions/s and the processor duty cycle I = 1. The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B =106 cycles/s. How many number of processors that can be supported by such organization? What hit rate is needed to support a 30-processor system.

Answer

Basic Cache Coherence Problem Multiple copies of data, spread throughout the caches, The copies in the caches are coherent if they all equal the same value Problems: Cache–Memory Coherence Cache–Cache Coherence

Cache–Memory Coherence The copy of the cache is not coherent with the copy in the memory Solutions: Write through the memory is updated every time the cache is updated, Write update the memory is updated only when the block in the cache is being replaced.

Example

Cache–Cache Coherence Two or more processors read the same data from the memory. Example: P reads X to its cache Q reads X to its cache What happens if P wants to write a new value over the old value of X?

Cache–Cache Coherence Write-invalidate Maintains consistency by reading from local caches until a write occurs. When any processor updates the value of X through a write, posting a dirty bit for X invalidates all other copies. Write-update Maintains consistency by immediately updating all copies in all caches.

Example

SNOOPING PROTOCOLS Snooping protocols are based on watching bus activities and carry out the appropriate coherency commands when necessary. Global memory is moved in blocks Each block has a state associated with it.

Write-Invalidate and Write-Through Protocol Multiple processors can read block copies from main memory safely until one processor updates its copy. All cache copies are invalidated and the memory is updated to remain consistent.

Write-Invalidate and Write-Through Protocol

Group Activity Consider a bus-based shared memory with two processors P and Q as shown in Figure Let us see how the cache coherence is maintained using Write-Invalidate Write-Through protocol. Assume that that X in memory was originally set to 5 and the following operations were performed in the order given: (1) P reads X; (2) Q reads X; (3) Q updates X; (4) Q reads X; (5) Q updates X; (6) P updates X; (7) Q reads X.

Answer

Write-Invalidate and Write-Back (Ownership Protocol) a valid block can be owned by memory and shared in multiple caches that can contain only the shared copies of the block. Multiple processors can safely read these blocks from their caches until one processor updates its copy. At this time, the writer becomes the only owner of the valid block and all other copies are invalidated.

Group Activity Consider the shared memory system of Figure and the following operations: (1) P reads X; (2) Q reads X; (3) Q updates X; (4) Q reads X; (5) Q updates X; (6) P updates X; (7) Q reads X.

Answer

Write-Once uses a combination of write-through and write-back. Write-through is used the very first time a block is written. Subsequent writes are performed using write-back.

Group Activity Consider the shared memory system of Figure and the following operations: (1) P reads X; (2) Q reads X; (3) Q updates X; (4) Q reads X; (5) Q updates X; (6) P updates X; (7) Q reads X.

Answer

Write-Update and Partial Write-Through An update to one cache is written to memory at the same time it is broadcast to other caches sharing the updated block.

Group Activity Consider the shared memory system of Figure and the following operations: (1) P reads X; (2) P updates X; (3) Q reads X; (4) Q updates X; (5) Q reads X; (6) Block X is replaced in P’s cache; (7) Q updates X; (8) P updates X.

Write-Update and Write-Back This protocol is similar to the previous one except that instead of writing through to the memory whenever a shared block is updated, memory updates are done only when the block is being replaced.

Group Activity Consider the shared memory system of Figure and the following operations: (1) P reads X; (2) P updates X; (3) Q reads X; (4) Q updates X; (5) Q reads X; (6) Block X is replaced in Q’s cache; (7) P updates X; (8) Q updates X.

Group Activity Consider the shared memory system of Figure and the following operations: (1) P reads X; (2) Q reads X; (3) Q updates X; (4) Q updates X; (5) P reads X; (6) Block X is replaced in Q’s cache; (7) P updates X; (8) Q updates X. Use the following protocols: Write-Invalidate and Write-Through Protocol Write-Invalidate and Write-Back (Ownership Protocol) Write-Once Write-Update and Partial Write-Through Write-Update and Write-Back