Download presentation
Presentation is loading. Please wait.
1
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University of Illinois at Urbana-Champaign *IBM Research http://iacoma.cs.uiuc.edu
2
Karin StraussFlexible Snooping2 Motivation CMPs are becoming standard components cheaper to build medium size machines –32 to 128 cores (multi-CMP) shared memory, cache coherent –easier to program, easier to manage supporting cache coherence is difficult
3
Karin StraussFlexible Snooping3 Cache coherence solutions long latenciessimpleno snoopy embedded ring difficult to scale simpleyes snoopy broadcast bus indirection, extra hardware scalableno directory based protocol conspros ordered network? strategy other proposals (e.g. token coherence)
4
Karin StraussFlexible Snooping4 Contributions compared to fastest state-of-the-art scheme performance energy consumption Superset Aggressive performance energy consumption Superset Conservative family of adaptive coherence protocols for rings two were chosen as best options high performance schemeenergy conscious scheme
5
Karin StraussFlexible Snooping5 Multi-CMP multiprocessor local network CMP Proc + L1 + L2 memory coherence protocol used: only one supplier if line is cached
6
Karin StraussFlexible Snooping6 Ring in action R S R S R S supplier predictor snoop request cmp LazyEager Oracle response data
7
Karin StraussFlexible Snooping7 Ring in action R S R S R S latency snoops messages goal: adaptive schemes that approximate Oracle’s behavior LazyEager Oracle
8
Karin StraussFlexible Snooping8 Primitive snooping actions X X snoop and then forward forward and then snoop forward only + fewer messages + shorter latency + fewer snoops + shorter latency – false negative predictions not allowed
9
Karin StraussFlexible Snooping9 Predictors and algorithms snoopforwardExact forward then snoop Agg forward snoop forward then snoop Subset action on positive prediction action on negative prediction predictor / algorithm Super set Con snoop then forward node can supply in predictor set of addresses:
10
Karin StraussFlexible Snooping10 Eager Subset Lazy SupersetAgg SupersetCon Oracle Algorithms / Exact number of snoops snoop message latency number of messages Per miss service: algorithmnegativepositive Subset forward then snoop snoop S u p e r set ConCon forward snoop then forward AggAggg forward then snoop Exactforwardsnoop
11
Karin StraussFlexible Snooping11 Predictor implementation Subset – associative table: subset of addresses that can be supplied by node Superset – bloom filter: superset of addresses that can be supplied by node – associative table (exclude cache): addresses that recently suffered false positives Exact – associative table: all addresses that can be supplied by node – downgrading: if address has to be evicted from predictor table, corresponding line in node has to be downgraded
12
Karin StraussFlexible Snooping12 Downgrading A B ES Negative effects: writes by this node need to snoop other nodes reads and writes by other nodes need to fetch line from memory A
13
Karin StraussFlexible Snooping13 Experiments 8 CMPs, 4 ooo cores each = 32 cores –private L2 caches on-chip bus interconnect off-chip 2D torus interconnect with embedded unidirectional ring per node predictors: latency of 3 processor cycles sesc simulator (sesc.sourceforge.net) SPLASH-2, SPECjbb, SPECweb
14
Karin StraussFlexible Snooping14 Execution time 0 0.2 0.4 0.6 0.8 1 1.2 SPLASH-2SPECjbbSPECweb Normalized execution time Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact the fastest of all algorithms is SupersetAgg performance of most flexible snooping algorithms is similar to Eager
15
Karin StraussFlexible Snooping15 Miss service energy 0 0.5 1 1.5 2 SPLASH-2SPECjbbSPECweb Normalized energy consumption Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact 3.22 SupersetCon is least energy-hungry algorithm algorithms that eagerly forward messages use more energy
16
Karin StraussFlexible Snooping16 Most cost-effective algorithms 0 0.2 0.4 0.6 0.8 1 1.2 SPLASH-2SPECjbbSPECweb Normalized execution time Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact 0 0.5 1 1.5 2 SPLASH-2SPECjbb SPECweb Normalized energy consumption 3.22 Lazy Eager Oracle Subset SupersetCon SupersetAgg Exact high performance: Superset Aggressive faster than Eager at lower energy consumption energy conscious: Superset Conservative slightly slower than Eager at much lower energy consumption
17
Karin StraussFlexible Snooping17 Most cost-effective algorithms compared to fastest state-of-the-art scheme (Eager) can be combined by only changing forwarding policy performance energy consumption performance energy consumption Superset Aggressive high performance scheme Superset Conservative energy conscious scheme
18
Karin StraussFlexible Snooping18 Conclusions proposed flexible snooping, a family of adaptive protocols for embedded rings two chosen protocols – high performance: Superset Aggressive – energy conservation: Superset Conservative – can be selected dynamically embedded-ring protocols more attractive
19
Karin StraussFlexible Snooping19 Arch map Google: architecture conference map (1 st hit) http://iacoma.cs.uiuc.edu/students/archmap/archmap.html
20
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors Karin Strauss, Xiaowei Shen*, Josep Torrellas University of Illinois at Urbana-Champaign *IBM Research http://iacoma.cs.uiuc.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.