22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distrubuted Shared-Memory Architectures by Seda Demirağ.

Slides:

Advertisements

Similar presentations

L.N. Bhuyan Adapted from Patterson’s slides

Advertisements

The University of Adelaide, School of Computer Science

CMSC 611: Advanced Computer Architecture

Cache Optimization Summary

1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix I Authors: John Hennessy & David Patterson.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

The University of Adelaide, School of Computer Science

Lecture 18: Multiprocessors

CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.

Chapter 6 Multiprocessors and Thread-Level Parallelism 吳俊興高雄大學資訊工程學系 December 2004 EEF011 Computer Architecture 計算機結構.

1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )

1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.

1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )

CS252/Patterson Lec /28/01 CS 213 Lecture 9: Multiprocessor: Directory Protocol.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.

1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.

Multiprocessor Cache Coherency

Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.

1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.

Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

EECS 252 Graduate Computer Architecture Lec 13 – Snooping Cache and Directory Based Multiprocessors David Patterson Electrical Engineering and Computer.

Lecture 13: Multiprocessors Kai Bu

Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Chapter 6 Multiprocessors and Thread-Level Parallelism

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

The University of Adelaide, School of Computer Science

1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

The University of Adelaide, School of Computer Science

Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.

Lecture 13: Multiprocessors Kai Bu

CS 704 Advanced Computer Architecture

Computer Engineering 2nd Semester

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

CS 704 Advanced Computer Architecture

Lecture 18: Coherence and Synchronization

Multiprocessor Cache Coherency

The University of Adelaide, School of Computer Science

CMSC 611: Advanced Computer Architecture

Example Cache Coherence Problem

The University of Adelaide, School of Computer Science

Chapter 6 Multiprocessors and Thread-Level Parallelism

Multiprocessors - Flynn’s taxonomy (1966)

Multiprocessors CS258 S99.

11 – Snooping Cache and Directory Based Multiprocessors

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

CS 213 Lecture 11: Multiprocessor 3: Directory Organization

Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Lecture 25: Multiprocessors

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Coherent caches Adapted from a lecture by Ian Watson, University of Machester.

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 19: Coherence and Synchronization

Lecture 18: Coherence and Synchronization

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distrubuted Shared-Memory Architectures by Seda Demirağ

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures According to Flynn (1972),computers’ parallelism can be categorizied like this: Computers’ Parallelism Single instruction stream, Single data stream (SISD) Single instruction stream, multiple data stream (SIMD) Multiple instruction stream, Single data stream (MISD) Multiple instruction stream, Multiple data stream (MIMD)

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures We can also clasify the MIMD Structures into two: MIMD Centralized (symmetric)- Shared Memory Architectures Distributed- Shared Memory Architecture

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Centralized(symmetric)-shared memory architecture Caches can contain either private or shared data. This causes cache chorence problem. Uniform access time to all memory from all processors

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Distributed memory architecture To support larger processor counts Some processors may be connected by a single bus, but this is less scalable than global interconnection network Cost effective way to scale the memory Bandwidth (if most of access is to local memory) But communicating data between processors becomes more complex, has higher latency, at least

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Models for Communication Among Processors Models for Communication Among Processors There are two alternative architectural approaches that differ in the method used for communicating data among processors Distributed Shared-Memory Architectures(DSM) : Commu. Occurs via a shared address space Multicomputers(Clusters) : The address space can consist of multiple private address spaces that are logically disjoint. Message-Passing Multiprocessors: Comm. Of data is done by explicitly passing messages among the processors. For an access or operation on data, a processors sends message to the receiver. Receiver performs the operation and sends the result back.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Distributed-shared memory architecture The first DSM architectures apperared in the late 1970s and continued through the early 1980s,embodied in three machines: the Carnegie Mellon Cm, the IBM RP3, and the BBN Buterfly. In uniprocessors, the long access time to memory is largely hidden throug the use of caches. Unfortunately, adapting caches to work in a multiprocessor enviroment is difficult. When used in a multiprocessor, caching introduces an additional problem: cache coherence, which arises when different processors cache and update values of the same memory location.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures What is cache coherence? I will explain this with an example: Processor: A, B Memory location X TimeEventCache content in A Cache content in B Memory content for X 0**1 1A reads X1*1 2B reads X111 3A Writes X010

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures A memory system is coherence when it satisfy the following conditions: To the same location, a write immediately followed by a read by the same processor will always return the written value. To the same location, a read from P2 immediately follows a write by P1 will returns the value written by P1 Two writes to the same location by any two processors are seen in the same order by all processors This ensures a shared location will not have different copies in cache blocks.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures DSM Architectures which excluding cache coherence: Protocols for cache coherence: These systems have caches,shared data are marked as uncacheable and only private data are kept in the caches. SW can cache the shared data by copying the data from the shared portion of the address space to the local private portion of at he address space that is cached. Coherence controlled by software. Advantage is little HW support. Snooping ProtocolDirectory Protocol

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Snooping Protocol: In a snooping system, all caches on the bus monitor the bus to determine if they have a copy of the block of data that is requested on the bus. Every cache has a copy of the sharing status of every block of physical memory it has. There are two types of Snooping Protocol: write-invalidate: the processor that is writing data causes copies in the caches of all other processors in the system to be rendered invalid before it changes its local copy. The local machine does this by sending an invalidation signal over the bus, which causes all of the other caches to check for a copy of the invalidated file. Once the cache copies have been invalidated, the data on the local machine can be updated until another processor requests it. write-update: the processor that is writing the data broadcasts the new data over the bus (without issuing the invalidation signal). All caches that contain copies of the data are then updated. This scheme differs from write-invalidate in that it does not create only one local copy for writes.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Directory-Based Cache Coherence Protocols Each directory is reaponsible for tracking caches that share the memory address of the portion of memory in the node. The directory must track the state of the cache block. The states are Shared, Uncached and Exclusive.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures The possible messages sent among nodes to maintain coherence, along with source and destination node. (P = requesting processor number, A = requested address, D = data contents.)

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Example of Directory Protocol State transition diagram for an İn dividual cache block in a directory-based system: Requests by the local processor are shown in black and those from home directory are shown in gray.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Example of Directory Protocol The state transition diagram for the directory: All actions are in gray because they are all externally caused. Bold indicates the action taken by the directory in response to the request. Bold italics indicate an action that updates the sharing set, Sharers.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Example of Directory Protocol (cont’d) The state of uncached: Read miss: The requesting processor is sent the requested data from memory and the requestor is made the only sharing node. The state of the block is made shared. Write miss: The requesting processor is sent the value and becomes the sharing node. The block is made exclusive to indicate that the only valid copy is cached. Sharers indicates the identity of the owner.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Example of Directory Protocol (cont’d) The state of shared: Read miss: The requesting processor is sent the requested data from memory and The requesting processor is added to the sharing set. Write miss: The requesting processor is sent the value. All processors in the set Sharers are sent invalidate messages, and the Sharers set is to contain the identity of the requesting processor. The state of the block is made exclusive.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Example of Directory Protocol (cont’d) The state of exclusive: Read miss: The qwner processor is sent a data fetch message. The identity of the requesting processor is added to the set Sharers, whivh still contains the identity of the processor that was the owner. Data write back: The owner processor is replacing the block and therefore must write it back. This write back makes the memory copy up to date, the block is now uncached and the Sharers set is empty. Write miss: The block has a new owner. A message sent to the old owner, causing the cache to invalidate the block and send the value to the directory. Sharers is set to the identity of the new owner, and the state of the block remains exclusive.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors In DSM architectures, the memory requests between local and remote is key to performance. It affects the bandwidth and the latency seen by requests. In the performance example we will separate the cache misses into local and remote requests. We will also compare the performance changings of the computational kernels FFT, LU; the applications Barnes and Ocean.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors(cont’d) The miss rates with these cache sizes are not affected much by changes in processor count, with the exception of Ocean. The rise of miss rate at 64 processors results from these factors: An increase in mapping conflicts in cache that occur when the grid becomes small which leads to a rise in local misses and an increase in the number of the coherence misses, which are all remote.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors(cont’d) This figure shows how the miss rates change as the cache size is increased, assuming a 64- processor execution and 64-byte blocks. By the time we reach the largest cache size shown 512 KB, the remote miss rat is equal to or greater than the local miss rate.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors(cont’d) We examine the effect of tchanging the block size in this example. Increases in block size reduce the mis rate, even for large blocks, although the performance benefits for going to the largest blocks are small. So most of the improvement in miss rate comes from a reduction in the local misses.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors(cont’d) The number of bytes per data reference climbs steadily as block size is increased.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Performance of DSM Multiprocessors(cont’d) The effective latency of memory references in a DSM multiprocessor depends both on the relative frequency of cache misses and on the location of the memory where the accesses are served.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures REFERENCES: Andrew S. T., Maarten V. S., Distributed Systems, 2002 John L. H., David A. P., Computer Architecture: A quantitive Approach, 2003 Abraham S., Peter B. G., Greg G., Operating Systems Concepts, 2003 Jinseok K., Gyungho L., Binding Time in Distributed Shared Memory Architectures, 1998 International Conference on Parallel Processing. Bill N., Virginia L., Distributed Shared Memory: A Survey of Issues and Algorithms, Volume 24, Issue 8, August 1991, IEEE Computer Society Press S. Zhou, M. Stumm, D. Wortman, K. Li, Heterogeneous Distributed Shared Memory, IEEE Transactions on Parallel and Distributed Systems, v.3 n.5, p , September 1992.

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distributed Shared-Memory Architectures Any Questions?