Mohamed ABDELFATTAH Vaughn BETZ
2 Why NoCs on FPGAs? Embedded NoCs Power Analysis
Interconnect 3 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires
4 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor
5 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Interfaces DDR/PCIe.. Hard Interfaces DDR/PCIe.. Interconnect still the same Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor 1600 MHz 200 MHz 800 MHz
6 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet 1600 MHz 200 MHz 800 MHz
7 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet
BarcelonaLos Angeles Keep the roads, but add freeways. Hard Blocks Logic Cluster Source: Google Earth
9 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated NoC RoutersLinks Router forwards data packet Router moves data to local interconnect
10 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect Pre-design NoC to requirements NoC links are re-usable NoC is heavily pipelined NoC abstraction favors modularity High bandwidth endpoints known
11 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Latency-tolerant communication NoC abstraction favors modularity Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect Previous work: Compelling area efficiency and performance NoCs can simplify FPGA design Does the NoC abstraction come at a high power cost?
12 Why NoCs on FPGAs? Embedded NoCs Power Analysis Mixed NoCs Hard NoCs
2. Embedded NoCs Mixed NoC Hard NoC Soft LinksHard Routers Hard LinksHard Routers = + + = Soft NoCSoft LinksSoft Routers + =
14 Soft Hard FPGA CAD Tools ASIC CAD Tools Design Compiler Area Speed Power? Power Toggle rates Gate-level simulation Mixed HSPICE
FPGA Router Embedded NoCs Logic blocks Baseline Router Programmable soft interconnect WidthVCsPortsBuffer /VC Mixed NoCSoft LinksHard Routers + =
FPGA Router Embedded NoCs 16 Mixed NoCSoft LinksHard Routers + =
Router 17 Assumed a mesh Can form any topology FPGA 2. Embedded NoCs Special Feature Configurable topology
FPGA Router Embedded NoCs Logic blocksDedicated hard interconnectProgrammable soft interconnect 18 Hard NoCHard LinksHard Routers + =
FPGA Router Embedded NoCs 19 Hard NoCHard LinksHard Routers + =
FPGA Router Embedded NoCs Low-V mode 1.1 V 0.9 V Save 33% Dynamic Power Special Feature ~15% slower 20 Hard NoCHard LinksHard Routers + =
21 Why NoCs on FPGAs? Embedded NoCs Power Analysis Components Analysis 3 3 System Analysis
22 Area Gap Speed Gap Power Gap Mixed Hard (Low-V) Soft 20X – 23X smaller 5X – 6X faster 9X11X (15X) Speed Area Speed Bisection BW 1. Power-aware design 2. NoC power budget 3. Comparison ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s Average 64 – NoC 1X Investigate BW and power together
Total BW = 250 GBps Most Efficient NoC? Power Analysis Links Power Routers Power Wider Links, Fewer Routers
Total BW = 250 GBps Most Efficient NoC? Power Analysis
Total BW = 250 GBps Most Efficient NoC? Power Analysis
26 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W 250 GB/s total bandwidth Typical FPGA Dynamic Power 3. Power Analysis 123% How much is used for system-level communication?
27 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W NoC 250 GB/s total bandwidth 15% Typical FPGA Dynamic Power 3. Power Analysis 123%
28 3. Power Analysis NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11%
29 3. Power Analysis NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 7%
GB/s 17 GB/s DDR3 Module 1 PCIe Module 2 Full theoretical BW 126 GB/s Aggregate Bandwidth 3.5% NoC Power Budget Cross whole chip! 3. Power Analysis
31 11 Point-to-point Links Broadcast 11 n Multiple Masters 1 1 Mux + Arbiter n Multiple Masters, Multiple Slaves 1 1 Mux + Arbiter n n Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC 1.. n Compare wires interconnect to NoCs 3. Power Analysis
32 Hard and Mixed NoCs very compelling Length of 1 NoC Link 1 % area overhead on Stratix 5 Runs at MHz Power on-par with simplest FPGA interconnect 3. Power Analysis 200 MHz High Performance / Packet Switched
Big city needs freeways to handle traffic Area: 20-23X Why NoCs on FPGAs? Embedded NoCs: Mixed & Hard Power Analysis Speed: 5-6XPower: 9-15X Power-aware design of embedded NoCs Power Budget for 100 GB/s: 3-7% Point-to-point soft Links: 4.7 mJ/GB Embedded NoCs: 4.5 – 10.4 mJ/GB
34 eecg.utoronto.ca/~mohamed/noc_designer.html
35 eecg.utoronto.ca/~mohamed/noc_designer.html