Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,

Similar presentations


Presentation on theme: "1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,"— Presentation transcript:

1 1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran Ginosar, Avinoam Kolodny QNoC Research Group Technion EE Department Technion, Haifa, Israel

2 2 E. Bolotin – The Power of Priority, NoCs 2007 Chip Multi-Processor (CMP) Dual-Core Monolithic shared cache Multi-Core Large cache Shared cache Distributed cache NoC-based: How?

3 3 E. Bolotin – The Power of Priority, NoCs 2007 Global wires delay Global wire delay 100 1 10 0.1 250 13090654532180250 Gate delay Source: ITRS 2003 Global Wires Delay Future Cache - Physics Perspective Large cache  Large access time Fraction of chip reachable in 1 clock cycle Source: Keckler et al. ISSCC 2003 Distance reached in single cycle  Today: ~25% of chip  In 10 years: ~1% of chip Large monolithic cache is not scalable

4 4 E. Bolotin – The Power of Priority, NoCs 2007 NUCA - Non Uniform Cache Architecture NUCA= Non uniform access times Banked cache over NoC  Smaller bank  Smaller Access Time  Multiple banks  Multiple Ports  Closer bank  Smaller Access Time Cache-line placement policy Static NUCA (SNUCA) Dynamic NUCA (DNUCA) Sources: Kim et al. ASPLOS 2002 Beckmann et al. MICRO 2004

5 5 E. Bolotin – The Power of Priority, NoCs 2007 Issues in NUCA-based CMP NoC performance  CMP performance Cache coherency and transaction order (correctness) Search (in DNUCA) Different traffic types (e.g. fetch vs. prefetch) Synchronization (locks) NoC Services for CMP?

6 6 E. Bolotin – The Power of Priority, NoCs 2007 Cache Coherency over NoC How do we maintain coherency over NoC? Snooping Central directory Cache bank with distributed directory Distributed directory

7 7 E. Bolotin – The Power of Priority, NoCs 2007 Distributed Cache Coherency Example: Simple read transaction Cache access  Multiple NoC transactions Ctrl. packet Data packet

8 8 E. Bolotin – The Power of Priority, NoCs 2007 Read Transaction of Modified Block Ctrl. packet Data packet

9 9 E. Bolotin – The Power of Priority, NoCs 2007 Read Exclusive of Shared Block Ctrl. packet Data packet

10 10 E. Bolotin – The Power of Priority, NoCs 2007 Smart interfaces Basic NoC to Support CMP Can We Do Better? Off-the-shelf (Vanilla) NoC: Grid of wormhole routers Unicast only Ordering in network  Static routing  No virtual channels Vanilla NoC

11 11 E. Bolotin – The Power of Priority, NoCs 2007 Observations: L2 Access A) Delay = Queueing + NoC transactions B) All NoC transactions are equally important C) NoC transactions consist of: Short ctrl. packets Long data packets Idea: Differentiate between Ctrl. and Data Solution: Preemptive Priority NoC  Give priority to short ctrl. packets

12 12 E. Bolotin – The Power of Priority, NoCs 2007 Preemptive Priority NoC: QNoC Multiple SL link QNoC Service Levels: Dedicated wormhole buffer Preemptive priority scheduling Multiple SL Router

13 13 E. Bolotin – The Power of Priority, NoCs 2007 Example: Vanilla NoC Blue delay ~X Red delay ~ 2X+δ Average delay ~ 1.5X Vanilla NoC example A B Without contention: X:Delay of long packet δ:Delay of short packet Long Data Transaction 1 Short Req. Long Resp. Transaction 2

14 14 E. Bolotin – The Power of Priority, NoCs 2007 Example: Priority NoC Blue delay=X Red delay = 2X+δ Average delay ~ 1.5X Without contention: X:Delay of long packet δ:Delay of short packet Vanilla NoC example A B Blue delay= X+δ Red delay = X+δ Average delay ~ X Potential delay reduction ~ 0.5X Priority NoC example Long Data Transaction 1 Short Req. Long Resp. Transaction 2

15 15 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Different Destinations Very important in wormhole When ctrl. packet is blocked by other worms Short Req. Long Data

16 16 E. Bolotin – The Power of Priority, NoCs 2007 Protocol Correctness Need state-preserving serialization of transactions in the processor interface

17 17 E. Bolotin – The Power of Priority, NoCs 2007 Numerical Evaluation CMP simulator (SIMICS)  Simulate parallel benchmarks  Obtain L2-cache access traces QNoC simulator (OPNET)  Simulate distributed coherence protocol over NoC  Measure total RD/RX L2-access delay  Measure total program throughput

18 18 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Results Short ctrl. packet gets high priority Long data packet gets low priority Delay Reduction vs. Network Load RD Delay - ApacheRD/RX Delay Reduction - Apache

19 19 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Several Benchmarks Delay ReductionProgram Speedup

20 20 E. Bolotin – The Power of Priority, NoCs 2007 So Far: The Power of Priority Simplicity - Almost for Free Significant CMP Speed-up Good For: Coherency Traffic differentiation (e.g. Fetch vs. Pre-Fetch) Search in DNUCA Synchronization (Locks)

21 21 E. Bolotin – The Power of Priority, NoCs 2007 Special Broadcast for Short Messages  Broadcast service (e.g. search in DNUCA)  Wormhole broadcast slow and expensive  S&F broadcast embedded in wormhole Virtual Ring  No Additional Cost  For Invalidation Multicast  Snooping or synchronization Advanced Support Functions

22 22 E. Bolotin – The Power of Priority, NoCs 2007 Summary NoC at CMP Service! Shared cache over NoC Priority is powerful Built-in support functions


Download ppt "1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,"

Similar presentations


Ads by Google