Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal.

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

CSCI 8150 Advanced Computer Architecture
Copyright Josep Torrellas 2003,20081 Cache Coherence Instructor: Josep Torrellas CS533 Term: Spring 2008.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
ECE669 L18: Scalable Parallel Caches April 6, 2004 ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CS 258 Parallel Computer Architecture LimitLESS Directories: A Scalable Cache Coherence Scheme David Chaiken, John Kubiatowicz, and Anant Agarwal Presented:
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
LimitLESS Directories: A Scalable Cache Coherence Scheme By: David Chaiken, John Kubiatowicz, John Kubiatowicz, Anant Agarwal Anant Agarwal Presented by:
CS252 Graduate Computer Architecture Lecture 20 April 12 th, 2010 Distributed Shared Memory Prof John D. Kubiatowicz
DISTRIBUTED COMPUTING
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 9: May 3, 2001 Distributed Shared Memory.
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
LimitLess Directories: A Scalable Cache Coherence Scheme By: David Chaiken, John Kubiatowicz, John Kubiatowicz, Anant Agarwal Anant Agarwal.
Lecture 8: Snooping and Directory Protocols
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Cache Coherence: Directory Protocol
Cache Coherence: Directory Protocol
Architecture and Design of AlphaServer GS320
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Reactive Synchronization Algorithms for Multiprocessors
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Cache Coherence Protocols 15th April, 2006
Multiple Processor Systems
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
Multiprocessor Highlights
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
CS 213 Lecture 11: Multiprocessor 3: Directory Organization
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CSE 486/586 Distributed Systems Cache Coherence
Lecture 17 Multiprocessors and Thread-Level Parallelism
Multiprocessors and Multi-computers
Presentation transcript:

Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal

Siva and Osman March 7, 2000 Consistency Different Directory Schemes Comparison of Directory schemes Hierarchical Directory scheme (in detail) Referred Papers: “Directory-Based Cache Coherence in Large-Scale Multiprocessors”, David Chaiken, Craig Fields, Kiyoshi Kurihara and Anant Agarwal “A Survey of Cache Coherence Schemes for Multiprocessors”, Per Stenstrom “Cache Consistency and Sequential Consistency”, James R Goodman “LimitLess Directories: A Scalable Cache Coherence Schemes”, David Chaiken, John Kubiatowicz and Anant Agarwal “A Hierarchical Directory Scheme for Large-Scale Cache-Coherent Multiprocessors”, A Dissertation by Yeong-Chang Maa

Siva and Osman March 7, 2000 CONSISTENCY Strict Consistency Any read to memory location X returns the value stored by the most recent write operation to X P1:W(x)1P1: W(x)1 P2:R(x)1P2:R(x)0R(x)1 Sequential Consistency : Program order + Memory coherence The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified y its program P1:W(x)1P1: W(x)1 P2:R(x)0R(x)1P2:R(x)1R(x)1

Siva and Osman March 7, 2000 Causal Consistency Writes that are potentially causally related must be seen by all process in the same order. Concurrent writes may be seen in a different order on different machines. P1:W(x)1W(x)3 P2:R(x)1W(x)2 P3R(x)1R(x)3R(x)2 P4R(x)1R(x)2R(x)3 PRAM Consistency Writes done by a single process are received by all other process in the order in which they are issued, but writes from different processes may be seen in a different order by different processes. Processor Consistency For every memory location X, there should be a global agreement about the order of writes to X CONSISTENCY

Siva and Osman March 7, 2000 Weak Consistency Using Synchronization variable which are sequentially consistent No access to a synchronization variable is allowed until all previous writes have completed everywhere No data access is allowed until all previous access to synchronization variable have been performed Release Consistency Barrier synchronization : Acquire and Release Acquire and Release should be processor consistent Lazy release and Eager release consistencies Entry Consistency Locks for each shared variable or element CONSISTENCY

Siva and Osman March 7, 2000 Need Limited Bandwidth Bus cycle times - ring out Scalability Disparity between bus and processor speed Increase in Bandwidth as processor number increases Drawback No Broadcast capability Complex protocol Directory based cache coherence

Siva and Osman March 7, 2000 Directory Schemes Tang’s scheme Full-mapped Each directory entry N bits + status bits for N processors Memory overhead scales as (square of N) assuming M  N Censier scheme (Distributed) Stenstrom scheme (Distributed) Limited Directories Classified as Dir i X, where X may be NB or B & i<N Eviction : Pointer replacement Resembles set associative cache and requires eviction policy Efficient if memory is referenced by few processors Memory overhead scales as (M*i*log N) If X is NB, can allow more than i copies to exist

Siva and Osman March 7, 2000 Chained Directories Make use of pointers like linked lists Complex cache-block replacement splice intermediate cache out of the chain Invalidate the location Variation: Doubly linked chain Optimizes replacement process Needs large average message block size Directory Schemes Comparison of full-mapped, limited, chained schemes Metric: Processor Utilization Utilization depends on frequency of Memory reference and latency of memory system Latency depends on topology, speed, number of processors, memory access latency, frequency and size of messages

Siva and Osman March 7, 2000 Directory Schemes Analysis No coherence : All addresses in trace are not shared. Gives upper bound Only cache private data : For comparison with other schemes P-Thor : minimize communication and has minimum synchronization points Speech : Poor performance of limited directories due to pointer thrashing Performance improvement by system level optimizations * Tree barrier structure instead of linear barrier * Separating read only blocks from read/write blocks * Reducing the block size

Siva and Osman March 7, 2000 Coarse Vector Dir i CV r Initially behaves as limited directory Switches to fully mapped Dir 0 B 2 status bit for 4 states : Absent, Present1: present and clean in only one cache, Present: present and clean in more than one cache, PresentM: present and dirty in only one cache LimitLess Directory Scheme Combination of hardware and software techniques Realize performance of full-map directory Memory overhead of limited directory Sectored Directory Dir N/L L sub-blocks share the directory Overhead is MN/L Directory Schemes

Siva and Osman March 7, 2000 Directory Schemes Directory Cache Dir a1,a2 a1 entries for short limited directory pointers a2 entries for long full-map pointers Hierarchical Scheme

Siva and Osman March 7, 2000 Network Architecture Wilson Hierarchical cache/bus architecture combination bus and directory scheme cache contains a copy of all blocks cached underneath it write Invalidate protocol Higher level caches act as filters Data Diffusion Machine Hierarchy of busses with large processor caches Write Invalidate protocol Only state information in higher order caches No global memory and cost effective Hierarchical Cache Coherence Schemes

Siva and Osman March 7, 2000 Hierarchical Full-mapped Directory Schemes tag bits Descendants presence vector ackctrMRUINVUPMRQTrdirty States of HFMD ABS : No entries in descendants; cleared des.vector and Tr bit ABT : descendants entries being invalidated; cleared des.vector and Tr bit RO : read only entries in the descendants; set des.vector, cleared dirty and Tr bits RW : a dirty (read write) entry is in the descendants; set des.vector, dirty bit and cleared TR bit RT : descendant entries have outstanding read requests; set des.vector and Tr bit, cleared dirty bit WT : descendant entries have outstanding write or modify request; set des.vector, dirty bit and Tr bit INV : descendant entries being invalidated from directory entry; cleared des.vector, set Tr bit and INV bit