Download presentation
Presentation is loading. Please wait.
1
1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran Ginosar, Avinoam Kolodny QNoC Research Group Technion EE Department Technion, Haifa, Israel
2
2 E. Bolotin – The Power of Priority, NoCs 2007 Chip Multi-Processor (CMP) Dual-Core Monolithic shared cache Multi-Core Large cache Shared cache Distributed cache NoC-based: How?
3
3 E. Bolotin – The Power of Priority, NoCs 2007 Global wires delay Global wire delay 100 1 10 0.1 250 13090654532180250 Gate delay Source: ITRS 2003 Global Wires Delay Future Cache - Physics Perspective Large cache Large access time Fraction of chip reachable in 1 clock cycle Source: Keckler et al. ISSCC 2003 Distance reached in single cycle Today: ~25% of chip In 10 years: ~1% of chip Large monolithic cache is not scalable
4
4 E. Bolotin – The Power of Priority, NoCs 2007 NUCA - Non Uniform Cache Architecture NUCA= Non uniform access times Banked cache over NoC Smaller bank Smaller Access Time Multiple banks Multiple Ports Closer bank Smaller Access Time Cache-line placement policy Static NUCA (SNUCA) Dynamic NUCA (DNUCA) Sources: Kim et al. ASPLOS 2002 Beckmann et al. MICRO 2004
5
5 E. Bolotin – The Power of Priority, NoCs 2007 Issues in NUCA-based CMP NoC performance CMP performance Cache coherency and transaction order (correctness) Search (in DNUCA) Different traffic types (e.g. fetch vs. prefetch) Synchronization (locks) NoC Services for CMP?
6
6 E. Bolotin – The Power of Priority, NoCs 2007 Cache Coherency over NoC How do we maintain coherency over NoC? Snooping Central directory Cache bank with distributed directory Distributed directory
7
7 E. Bolotin – The Power of Priority, NoCs 2007 Distributed Cache Coherency Example: Simple read transaction Cache access Multiple NoC transactions Ctrl. packet Data packet
8
8 E. Bolotin – The Power of Priority, NoCs 2007 Read Transaction of Modified Block Ctrl. packet Data packet
9
9 E. Bolotin – The Power of Priority, NoCs 2007 Read Exclusive of Shared Block Ctrl. packet Data packet
10
10 E. Bolotin – The Power of Priority, NoCs 2007 Smart interfaces Basic NoC to Support CMP Can We Do Better? Off-the-shelf (Vanilla) NoC: Grid of wormhole routers Unicast only Ordering in network Static routing No virtual channels Vanilla NoC
11
11 E. Bolotin – The Power of Priority, NoCs 2007 Observations: L2 Access A) Delay = Queueing + NoC transactions B) All NoC transactions are equally important C) NoC transactions consist of: Short ctrl. packets Long data packets Idea: Differentiate between Ctrl. and Data Solution: Preemptive Priority NoC Give priority to short ctrl. packets
12
12 E. Bolotin – The Power of Priority, NoCs 2007 Preemptive Priority NoC: QNoC Multiple SL link QNoC Service Levels: Dedicated wormhole buffer Preemptive priority scheduling Multiple SL Router
13
13 E. Bolotin – The Power of Priority, NoCs 2007 Example: Vanilla NoC Blue delay ~X Red delay ~ 2X+δ Average delay ~ 1.5X Vanilla NoC example A B Without contention: X:Delay of long packet δ:Delay of short packet Long Data Transaction 1 Short Req. Long Resp. Transaction 2
14
14 E. Bolotin – The Power of Priority, NoCs 2007 Example: Priority NoC Blue delay=X Red delay = 2X+δ Average delay ~ 1.5X Without contention: X:Delay of long packet δ:Delay of short packet Vanilla NoC example A B Blue delay= X+δ Red delay = X+δ Average delay ~ X Potential delay reduction ~ 0.5X Priority NoC example Long Data Transaction 1 Short Req. Long Resp. Transaction 2
15
15 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Different Destinations Very important in wormhole When ctrl. packet is blocked by other worms Short Req. Long Data
16
16 E. Bolotin – The Power of Priority, NoCs 2007 Protocol Correctness Need state-preserving serialization of transactions in the processor interface
17
17 E. Bolotin – The Power of Priority, NoCs 2007 Numerical Evaluation CMP simulator (SIMICS) Simulate parallel benchmarks Obtain L2-cache access traces QNoC simulator (OPNET) Simulate distributed coherence protocol over NoC Measure total RD/RX L2-access delay Measure total program throughput
18
18 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Results Short ctrl. packet gets high priority Long data packet gets low priority Delay Reduction vs. Network Load RD Delay - ApacheRD/RX Delay Reduction - Apache
19
19 E. Bolotin – The Power of Priority, NoCs 2007 Priority NoC: Several Benchmarks Delay ReductionProgram Speedup
20
20 E. Bolotin – The Power of Priority, NoCs 2007 So Far: The Power of Priority Simplicity - Almost for Free Significant CMP Speed-up Good For: Coherency Traffic differentiation (e.g. Fetch vs. Pre-Fetch) Search in DNUCA Synchronization (Locks)
21
21 E. Bolotin – The Power of Priority, NoCs 2007 Special Broadcast for Short Messages Broadcast service (e.g. search in DNUCA) Wormhole broadcast slow and expensive S&F broadcast embedded in wormhole Virtual Ring No Additional Cost For Invalidation Multicast Snooping or synchronization Advanced Support Functions
22
22 E. Bolotin – The Power of Priority, NoCs 2007 Summary NoC at CMP Service! Shared cache over NoC Priority is powerful Built-in support functions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.