Presentation is loading. Please wait.

Presentation is loading. Please wait.

Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal.

Similar presentations


Presentation on theme: "Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal."— Presentation transcript:

1 Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal

2 Siva and Osman March 7, 2000 Consistency Different Directory Schemes Comparison of Directory schemes Hierarchical Directory scheme (in detail) Referred Papers: “Directory-Based Cache Coherence in Large-Scale Multiprocessors”, David Chaiken, Craig Fields, Kiyoshi Kurihara and Anant Agarwal “A Survey of Cache Coherence Schemes for Multiprocessors”, Per Stenstrom “Cache Consistency and Sequential Consistency”, James R Goodman “LimitLess Directories: A Scalable Cache Coherence Schemes”, David Chaiken, John Kubiatowicz and Anant Agarwal “A Hierarchical Directory Scheme for Large-Scale Cache-Coherent Multiprocessors”, A Dissertation by Yeong-Chang Maa

3 Siva and Osman March 7, 2000 CONSISTENCY Strict Consistency Any read to memory location X returns the value stored by the most recent write operation to X P1:W(x)1P1: W(x)1 P2:R(x)1P2:R(x)0R(x)1 Sequential Consistency : Program order + Memory coherence The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified y its program P1:W(x)1P1: W(x)1 P2:R(x)0R(x)1P2:R(x)1R(x)1

4 Siva and Osman March 7, 2000 Causal Consistency Writes that are potentially causally related must be seen by all process in the same order. Concurrent writes may be seen in a different order on different machines. P1:W(x)1W(x)3 P2:R(x)1W(x)2 P3R(x)1R(x)3R(x)2 P4R(x)1R(x)2R(x)3 PRAM Consistency Writes done by a single process are received by all other process in the order in which they are issued, but writes from different processes may be seen in a different order by different processes. Processor Consistency For every memory location X, there should be a global agreement about the order of writes to X CONSISTENCY

5 Siva and Osman March 7, 2000 Weak Consistency Using Synchronization variable which are sequentially consistent No access to a synchronization variable is allowed until all previous writes have completed everywhere No data access is allowed until all previous access to synchronization variable have been performed Release Consistency Barrier synchronization : Acquire and Release Acquire and Release should be processor consistent Lazy release and Eager release consistencies Entry Consistency Locks for each shared variable or element CONSISTENCY

6 Siva and Osman March 7, 2000 Need Limited Bandwidth Bus cycle times - ring out Scalability Disparity between bus and processor speed Increase in Bandwidth as processor number increases Drawback No Broadcast capability Complex protocol Directory based cache coherence

7 Siva and Osman March 7, 2000 Directory Schemes Tang’s scheme Full-mapped Each directory entry N bits + status bits for N processors Memory overhead scales as (square of N) assuming M  N Censier scheme (Distributed) Stenstrom scheme (Distributed) Limited Directories Classified as Dir i X, where X may be NB or B & i<N Eviction : Pointer replacement Resembles set associative cache and requires eviction policy Efficient if memory is referenced by few processors Memory overhead scales as (M*i*log N) If X is NB, can allow more than i copies to exist

8 Siva and Osman March 7, 2000 Chained Directories Make use of pointers like linked lists Complex cache-block replacement splice intermediate cache out of the chain Invalidate the location Variation: Doubly linked chain Optimizes replacement process Needs large average message block size Directory Schemes Comparison of full-mapped, limited, chained schemes Metric: Processor Utilization Utilization depends on frequency of Memory reference and latency of memory system Latency depends on topology, speed, number of processors, memory access latency, frequency and size of messages

9 Siva and Osman March 7, 2000 Directory Schemes Analysis No coherence : All addresses in trace are not shared. Gives upper bound Only cache private data : For comparison with other schemes P-Thor : minimize communication and has minimum synchronization points Speech : Poor performance of limited directories due to pointer thrashing Performance improvement by system level optimizations * Tree barrier structure instead of linear barrier * Separating read only blocks from read/write blocks * Reducing the block size

10 Siva and Osman March 7, 2000 Coarse Vector Dir i CV r Initially behaves as limited directory Switches to fully mapped Dir 0 B 2 status bit for 4 states : Absent, Present1: present and clean in only one cache, Present: present and clean in more than one cache, PresentM: present and dirty in only one cache LimitLess Directory Scheme Combination of hardware and software techniques Realize performance of full-map directory Memory overhead of limited directory Sectored Directory Dir N/L L sub-blocks share the directory Overhead is MN/L Directory Schemes

11 Siva and Osman March 7, 2000 Directory Schemes Directory Cache Dir a1,a2 a1 entries for short limited directory pointers a2 entries for long full-map pointers Hierarchical Scheme

12 Siva and Osman March 7, 2000 Network Architecture Wilson Hierarchical cache/bus architecture combination bus and directory scheme cache contains a copy of all blocks cached underneath it write Invalidate protocol Higher level caches act as filters Data Diffusion Machine Hierarchy of busses with large processor caches Write Invalidate protocol Only state information in higher order caches No global memory and cost effective Hierarchical Cache Coherence Schemes

13 Siva and Osman March 7, 2000 Hierarchical Full-mapped Directory Schemes tag bits Descendants presence vector ackctrMRUINVUPMRQTrdirty States of HFMD ABS : No entries in descendants; cleared des.vector and Tr bit ABT : descendants entries being invalidated; cleared des.vector and Tr bit RO : read only entries in the descendants; set des.vector, cleared dirty and Tr bits RW : a dirty (read write) entry is in the descendants; set des.vector, dirty bit and cleared TR bit RT : descendant entries have outstanding read requests; set des.vector and Tr bit, cleared dirty bit WT : descendant entries have outstanding write or modify request; set des.vector, dirty bit and Tr bit INV : descendant entries being invalidated from directory entry; cleared des.vector, set Tr bit and INV bit


Download ppt "Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal."

Similar presentations


Ads by Google