Presentation is loading. Please wait.

Presentation is loading. Please wait.

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University.

Similar presentations


Presentation on theme: "Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University."— Presentation transcript:

1 Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University of Illinois at Urbana-Champaign *IBM Research http://iacoma.cs.uiuc.edu

2 Karin StraussFlexible Snooping2 Motivation CMPs are becoming standard components cheaper to build medium size machines –32 to 128 cores (multi-CMP) shared memory, cache coherent –easier to program, easier to manage supporting cache coherence is difficult

3 Karin StraussFlexible Snooping3 Cache coherence solutions long latenciessimpleno snoopy embedded ring difficult to scale simpleyes snoopy broadcast bus indirection, extra hardware scalableno directory based protocol conspros ordered network? strategy other proposals (e.g. token coherence)

4 Karin StraussFlexible Snooping4 Contributions compared to fastest state-of-the-art scheme performance energy consumption Superset Aggressive performance energy consumption Superset Conservative family of adaptive coherence protocols for rings two were chosen as best options high performance schemeenergy conscious scheme

5 Karin StraussFlexible Snooping5 Multi-CMP multiprocessor local network CMP Proc + L1 + L2 memory coherence protocol used: only one supplier if line is cached

6 Karin StraussFlexible Snooping6 Ring in action R S R S R S supplier predictor snoop request cmp LazyEager Oracle response data

7 Karin StraussFlexible Snooping7 Ring in action R S R S R S latency snoops messages goal: adaptive schemes that approximate Oracle’s behavior LazyEager Oracle

8 Karin StraussFlexible Snooping8 Primitive snooping actions X X snoop and then forward forward and then snoop forward only + fewer messages + shorter latency + fewer snoops + shorter latency – false negative predictions not allowed

9 Karin StraussFlexible Snooping9 Predictors and algorithms snoopforwardExact forward then snoop Agg forward snoop forward then snoop Subset action on positive prediction action on negative prediction predictor / algorithm Super set Con snoop then forward node can supply in predictor set of addresses:

10 Karin StraussFlexible Snooping10 Eager Subset Lazy SupersetAgg SupersetCon Oracle Algorithms / Exact number of snoops snoop message latency number of messages Per miss service: algorithmnegativepositive Subset forward then snoop snoop S u p e r set ConCon forward snoop then forward AggAggg forward then snoop Exactforwardsnoop

11 Karin StraussFlexible Snooping11 Predictor implementation Subset – associative table: subset of addresses that can be supplied by node Superset – bloom filter: superset of addresses that can be supplied by node – associative table (exclude cache): addresses that recently suffered false positives Exact – associative table: all addresses that can be supplied by node – downgrading: if address has to be evicted from predictor table, corresponding line in node has to be downgraded

12 Karin StraussFlexible Snooping12 Downgrading A B ES Negative effects: writes by this node need to snoop other nodes reads and writes by other nodes need to fetch line from memory A

13 Karin StraussFlexible Snooping13 Experiments 8 CMPs, 4 ooo cores each = 32 cores –private L2 caches on-chip bus interconnect off-chip 2D torus interconnect with embedded unidirectional ring per node predictors: latency of 3 processor cycles sesc simulator (sesc.sourceforge.net) SPLASH-2, SPECjbb, SPECweb

14 Karin StraussFlexible Snooping14 Execution time 0 0.2 0.4 0.6 0.8 1 1.2 SPLASH-2SPECjbbSPECweb Normalized execution time Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact the fastest of all algorithms is SupersetAgg performance of most flexible snooping algorithms is similar to Eager 

15 Karin StraussFlexible Snooping15 Miss service energy 0 0.5 1 1.5 2 SPLASH-2SPECjbbSPECweb Normalized energy consumption Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact 3.22 SupersetCon is least energy-hungry algorithm algorithms that eagerly forward messages use more energy 

16 Karin StraussFlexible Snooping16 Most cost-effective algorithms 0 0.2 0.4 0.6 0.8 1 1.2 SPLASH-2SPECjbbSPECweb Normalized execution time Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact 0 0.5 1 1.5 2 SPLASH-2SPECjbb SPECweb Normalized energy consumption 3.22 Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact high performance: Superset Aggressive faster than Eager at lower energy consumption energy conscious: Superset Conservative slightly slower than Eager at much lower energy consumption  

17 Karin StraussFlexible Snooping17 Most cost-effective algorithms compared to fastest state-of-the-art scheme (Eager) can be combined by only changing forwarding policy performance energy consumption performance energy consumption Superset Aggressive high performance scheme Superset Conservative energy conscious scheme

18 Karin StraussFlexible Snooping18 Conclusions proposed flexible snooping, a family of adaptive protocols for embedded rings two chosen protocols – high performance: Superset Aggressive – energy conservation: Superset Conservative – can be selected dynamically embedded-ring protocols more attractive

19 Karin StraussFlexible Snooping19 Arch map Google: architecture conference map (1 st hit) http://iacoma.cs.uiuc.edu/students/archmap/archmap.html

20 Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University of Illinois at Urbana-Champaign *IBM Research http://iacoma.cs.uiuc.edu


Download ppt "Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University."

Similar presentations


Ads by Google