Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.

Similar presentations


Presentation on theme: "An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy."— Presentation transcript:

1 An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy Harton

2 Motivation Memory consistency model determines extent to which memory operations may be overlapped or reordered SC vs. RC –Single issue statically scheduled processor –Blocking reads –Straightforward implementations & trace driven simulations Perform quantitative comparison of several implementations of SC & RC with ILP processors –Hardware prefetching & speculative loads

3 Current Implementations Simple implementation prohibit operation from entering memory system until all previous operations have completed Consistency Optimizations –Hardware controlled non-binding prefetch SC to obtain remote data for reads RC to prefetch reads past acquire Store Prefetch –Speculative Load Execution Speculative Load Buffer –Data remains visible to coherence mechanism –Reissue & rollback

4 Evaluation Methodology Hardware cache coherent multiprocessor –3-state directory protocol, 2 D mesh –Node has ILP processor, 2 levels of cache, part of main memory & directory –Simple SC Issue memory operation after previous one complete –Hardware prefetching Prefetch to primary writeback write allocate cache Prefetch to secondary cache for write through no write allocate –Speculative load execution Stopping load from retiring & reissue & flushing inst. Window SC used for issuing out of order loads RC used for loads past acquire RSIM instruction driven simulator

5 Evaluation Methodology Metrics –Execution time divided into CPU time & stalls –Cycle counted as busy if max possible no. of instructions retired otherwise counted as stall time component Applications –Radix, FFT, LU, Water, MP3D, Erlebacher

6 Evaluation SC system with first level write through cache –Prefetching improves performance but with small improvements for some applications –Speculative Load Execution leads to a factor of two speedup –Neither technique reduces large store latency SC system with first level write back cache –Contribution of write latency to execution time decreases –Hardware prefetching and speculative loads similar benefits Execution time & read latency –LU gets reduction in store stall time

7 Evaluation RC systems –Optimizations don’t provide much improvement Best improvement 7.7% for water Write through L1 cache has similar performance as Write Back L1 RC vs. SC –Simple RC performs better than most optimized SC More so for write through L1 Gap can be even more Aggressive Protocol –Delay ownership request for writes to line which have pending reads –Try to improve overlap of ownership request –Approximated by using s/w prefetch instructions –SC achieves reduced store latency with one exception –RC does not achieve much improvement

8 Techniques for Tolerating Acquire Latency RC assumes all operations after acquire depend on it Fuzzy Acquire –Acquire=non-blocking load+barrier –Independent operation can be inserted between read & barrier Selective Acquire –Uses arithmetic instruction to explicitly and selectively establish dependencies


Download ppt "An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy."

Similar presentations


Ads by Google