Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis 1 1 2 2 3 3 Comparison Against P2P/Buses 4 4.

Similar presentations


Presentation on theme: "Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis 1 1 2 2 3 3 Comparison Against P2P/Buses 4 4."— Presentation transcript:

1 Mohamed ABDELFATTAH Vaughn BETZ

2 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis 1 1 2 2 3 3 Comparison Against P2P/Buses 4 4

3 Interconnect 3 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires

4 4 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor

5 5 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Interfaces DDR/PCIe.. Hard Interfaces DDR/PCIe.. Interconnect still the same Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor 1600 MHz 200 MHz 800 MHz

6 6 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet 1600 MHz 200 MHz 800 MHz

7 7 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet

8 BarcelonaLos Angeles Keep the “roads”, but add “freeways”. Hard Blocks Logic Cluster Source: Google Earth

9 9 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated NoC RoutersLinks Router forwards data packet Router moves data to local interconnect

10 10 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect  Pre-design NoC to requirements  NoC links are “re-usable”  NoC is heavily “pipelined”  NoC abstraction favors modularity  High bandwidth endpoints known

11 11 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet  Latency-tolerant communication  NoC abstraction favors modularity Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect NoCs can simplify FPGA design Does the NoC abstraction come at a high area/power cost? How to integrate NoCs in FPGAs? How do embedded NoCs compare to current interconnects?

12 12 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis 1 1 2 2 3 3 Mixed NoCs Hard NoCs Comparison Against P2P/Buses 4 4

13 2. Embedded NoCs “Mixed” NoC “Hard” NoC Soft LinksHard Routers Hard LinksHard Routers = + + = “Soft” NoCSoft LinksSoft Routers + =

14 14 Soft Hard FPGA CAD Tools ASIC CAD Tools Design Compiler Area Speed Power? Power Toggle rates Gate-level simulation Mixed HSPICE

15 FPGA Router 15 2. Embedded NoCs Logic blocks Baseline Router Programmable “soft” interconnect WidthVCsPortsBuffer 322510/VC “Mixed” NoCSoft LinksHard Routers + =

16 FPGA Router 16 2. Embedded NoCs 16 “Mixed” NoCSoft LinksHard Routers + =

17 Router 17 Assumed a mesh  Can form any topology FPGA 2. Embedded NoCs Special Feature Configurable topology

18 FPGA Router 18 2. Embedded NoCs Logic blocksDedicated “hard” interconnectProgrammable “soft” interconnect 18 “Hard” NoCHard LinksHard Routers + =

19 FPGA Router 19 2. Embedded NoCs 19 “Hard” NoCHard LinksHard Routers + =

20 FPGA Router 20 2. Embedded NoCs Low-V mode 1.1 V 0.9 V Save 33% Dynamic Power Special Feature ~15% slower 20 “Hard” NoCHard LinksHard Routers + =

21 21 2. Embedded NoCs 21 Width adaptation Frequency adaptation Voltage adaptation Bridge NoC and FPGA fabric: Bus protocol e.g. AXI

22 22 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Area & Power Analysis Soft vs. mixed vs.Hard 3 3 System Area/Power Comparison Against P2P/Buses 4 4

23 23  State-of-the-art router architecture from Stanford: 1.NoC community have excelled at building on-chip routers: We just use it 2.To meet FPGA bandwidth requirements: High-performance router 3.Complex functionality such as virtual channels: Assigning traffic priority could be useful 3. Area/Power Analysis

24 24 3. Area/Power Analysis Hard Router vs. Soft Router 9X smaller, 2.4X faster, 1.4X lower power 30X smaller, 6X faster, 14X lower power Hard Links vs. Soft Links

25 25 Area Gap Speed Gap Power Gap MixedHard (Low-V) Soft 20X – 23X smaller 5X – 6X faster 9X – 11X (15X) less Average 1X 3. Area/Power Analysis

26 MixedHard Soft Speed Bisection BW ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s 64 – NoC [65 nm] 3. Area/Power Analysis 576 LBs ~12,500 LBs Area 448 LBs 64-node NoC on Stratix III

27 MixedHard (Low-V) Soft Speed Bisection BW ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s 64 – NoC [65 nm] 3. Area/Power Analysis 576 LBs ~12,500 LBs Area 448 LBs Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes 64-node NoC on Stratix III

28 28 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W 250 GB/s total bandwidth Typical FPGA Dynamic Power 123% How much is used for system-level communication? 3. Area/Power Analysis Largest Stratix-III device

29 29 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W NoC 250 GB/s total bandwidth 15% Typical FPGA Dynamic Power 3. Area/Power Analysis 123%

30 30 NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 3. Area/Power Analysis

31 31 NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 7% 3. Area/Power Analysis

32 32 14.6 GB/s 17 GB/s DDR3  Module 1 PCIe  Module 2 Full theoretical BW 126 GB/s Aggregate Bandwidth 3.5% NoC Power Budget Cross whole chip! 3. Area/Power Analysis

33 33 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Area &Power Analysis Point-to-point links 3 3 Comparison Against P2P/Buses 4 4 Qsys Buses

34 34 11 Point-to-point Links Broadcast 11 n Multiple Masters 1 1 Mux + Arbiter n Multiple Masters, Multiple Slaves 1 1 Mux + Arbiter n n Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC 1.. n Compare “wires” interconnect to NoCs 4. Comparison

35 35 Hard and Mixed NoCs  Area/Power Efficient Length of 1 NoC Link 1 % area overhead on Stratix 5 Runs at 730-943 MHz Power on-par with simplest FPGA interconnect 200 MHz High Performance / Packet Switched 4. Comparison

36 36 4. Comparison Qsys bus: Build logical bus from fabric Embedded NoC: 16 Nodes, hard routers & links

37 37 4. Comparison Steps to close timing using Qsys close FPGA

38 38 4. Comparison Steps to close timing using Qsys far FPGA

39 39 4. Comparison Steps to close timing using Qsys far FPGA Timing closure can be simplified with an embedded NoC

40 40 4. Comparison

41 41 4. Comparison

42 42 4. Comparison Entire NoC smaller than bus for 3 modules!

43 43 4. Comparison 1/8 Hard NoC BW used  already less area for most systems

44 44 4. Comparison Hard NoC saves power for even the simplest systems

45 1 1 2 2 3 3 Big city needs freeways to handle traffic Area: 20-23X Why NoCs on FPGAs? Embedded NoCs: Mixed & Hard Area & Power Analysis Speed: 5-6XPower: 9-15X Area Budget for 64 nodes: ~1% Power Budget for 100 GB/s: 3-7% Comparison Against P2P/Buses 4 4 Raw efficiency close to simplest P2P links NoC more efficient & lower design effort

46 46 eecg.utoronto.ca/~mohamed/noc_designer.html

47 47 eecg.utoronto.ca/~mohamed/noc_designer.html

48  200 MHz 128-bit module, 900 MHz 32-bit router?  Configurable time-domain mux / demux: match bandwidth  Asynchronous FIFO: cross clock domains  Full NoC bandwidth, w/o clock restrictions on modules 48 2. Embedded NoCs

49 49 1. Why NoCs on FPGAs? Maxeler Geoscience (14x, 70x) Financial analysis (5x, 163x) Altera OpenCL Video compression (3x, 114x) Information filtering (5.5x) GPU CPU

50 50 1. Why NoCs on FPGAs?

51 51 1. Why NoCs on FPGAs?

52 52 1. Why NoCs on FPGAs? NoC

53


Download ppt "Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis 1 1 2 2 3 3 Comparison Against P2P/Buses 4 4."

Similar presentations


Ads by Google