Presentation is loading. Please wait.

Presentation is loading. Please wait.

Augmenting FPGAs with Embedded Networks-on-Chip

Similar presentations


Presentation on theme: "Augmenting FPGAs with Embedded Networks-on-Chip"— Presentation transcript:

1 Augmenting FPGAs with Embedded Networks-on-Chip
Mohamed ABDELFATTAH Vaughn BETZ

2 Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3
Comparison Against Buses

3 Motivation Logic Blocks Switch Blocks Wires Interconnect
1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks Interconnect Wires

4 Motivation Logic Blocks Switch Blocks Wires Hard Blocks: Memory
1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks Wires Hard Blocks: Memory Multiplier Processor

5 Motivation Hard Interfaces DDR/PCIe .. Logic Blocks Switch Blocks
1. Why NoCs on FPGAs? Motivation 1600 MHz Hard Interfaces DDR/PCIe .. Logic Blocks 800 MHz Switch Blocks Interconnect still the same Wires 200 MHz Hard Blocks: Memory Multiplier Processor

6 Motivation Problems: Bandwidth requirements for hard logic/interfaces
1. Why NoCs on FPGAs? Motivation Problems: Bandwidth requirements for hard logic/interfaces Timing closure 1600 MHz DDR3 PHY and Controller PCIe Controller 800 MHz 200 MHz Gigabit Ethernet

7 Motivation Wire speed not scaling: Problems:
1. Why NoCs on FPGAs? Motivation Problems: Bandwidth requirements for hard logic/interfaces Timing closure High interconnect utilization: Huge CAD Problem Slow compilation Power/area utilization Wire speed not scaling: Delay is interconnect-dominated DDR3 PHY and Controller PCIe Controller Gigabit Ethernet

8 Keep the “roads”, but add “freeways”.
Source: Google Earth Barcelona Los Angeles Keep the “roads”, but add “freeways”. Logic Cluster Hard Blocks

9 FPGA with NoC NoC Links Routers Router forwards data packet
1. Why NoCs on FPGAs? FPGA with NoC NoC Problems: Bandwidth requirements for hard logic/interfaces Timing closure High interconnect utilization: Huge CAD Problem Slow compilation Power/area utilization Wire speed not scaling: Delay is interconnect-dominated DDR3 PHY and Controller Router forwards data packet Links PCIe Controller Router moves data to local interconnect Routers Gigabit Ethernet

10 FPGA with NoC Wire speed not scaling: High bandwidth endpoints known
1. Why NoCs on FPGAs? FPGA with NoC Problems: Bandwidth requirements for hard logic/interfaces Timing closure High interconnect utilization: Huge CAD Problem Slow compilation Power/area utilization Wire speed not scaling: Delay is interconnect-dominated Abstraction favours modularity: Parallel compilation Partial reconfiguration Multi-chip interconnect DDR3 PHY and Controller PCIe Controller High bandwidth endpoints known Pre-design NoC to requirements Gigabit Ethernet NoC links are “re-usable” NoC is heavily “pipelined” NoC abstraction favors modularity

11 FPGA with NoC Wire speed not scaling: Latency-tolerant communication
1. Why NoCs on FPGAs? FPGA with NoC Problems: Bandwidth requirements for hard logic/interfaces Timing closure High interconnect utilization: Huge CAD Problem Slow compilation Power/area utilization Wire speed not scaling: Delay is interconnect-dominated Abstraction favours modularity: Parallel compilation Partial reconfiguration Multi-chip interconnect DDR3 PHY and Controller PCIe Controller Gigabit Ethernet Latency-tolerant communication NoC abstraction favors modularity

12 Compute Acceleration Maxeler Geoscience (14x, 70x)
1. Why NoCs on FPGAs? Compute Acceleration GPU CPU Maxeler Geoscience (14x, 70x) Financial analysis (5x, 163x) Altera OpenCL Video compression (3x, 114x) Information filtering (5.5x)

13 1. Why NoCs on FPGAs? Compute Acceleration

14 1. Why NoCs on FPGAs? Compute Acceleration

15 1. Why NoCs on FPGAs? Compute Acceleration NoC

16 Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3
Mixed NoCs Hard NoCs 3 Comparison Against Buses

17 = = = + + + Embedded NoCs Soft Routers Soft Links “Soft” NoC
Hard Routers Soft Links “Mixed” NoC = + Hard Routers Hard Links “Hard” NoC

18 Methodology Soft Mixed Hard FPGA CAD Tools ASIC CAD Tools Area Speed
Design Compiler Power? Power HSPICE Gate-level simulation Gate-level simulation Toggle rates

19 = + Mixed NoCs Hard Routers Soft Links “Mixed” NoC Router Logic blocks
2. Embedded NoCs FPGA Router Logic blocks Programmable “soft” interconnect Baseline Router Width VCs Ports Buffer 32 2 5 10/VC + = Hard Routers Soft Links “Mixed” NoC

20 = + Mixed NoCs Hard Routers Soft Links “Mixed” NoC Router FPGA
2. Embedded NoCs FPGA Router + = Hard Routers Soft Links “Mixed” NoC 20

21 Assumed a mesh  Can form any topology
Mixed NoCs 2. Embedded NoCs FPGA Router Special Feature Configurable topology Assumed a mesh  Can form any topology

22 = + Hard NoCs Hard Routers Hard Links “Hard” NoC Router Logic blocks
2. Embedded NoCs FPGA Router Logic blocks Programmable “soft” interconnect Dedicated “hard” interconnect + = Hard Routers Hard Links “Hard” NoC 22

23 = + Hard NoCs Hard Routers Hard Links “Hard” NoC Router FPGA
2. Embedded NoCs FPGA Router + = Hard Routers Hard Links “Hard” NoC 23

24 = + Hard NoCs Hard Routers Hard Links “Hard” NoC Router FPGA
2. Embedded NoCs 1.1 V 0.9 V FPGA Router Special Feature Low-V mode Save 33% Dynamic Power ~15% slower + = Hard Routers Hard Links “Hard” NoC 24

25 Routers and Links Hard Router vs. Soft Router
30X smaller, 6X faster, 14X lower power Hard Links vs. Soft Links 9X smaller, 2.4X faster, 1.4X lower power

26 Soft, Mixed and Hard 1X Soft Mixed Hard (Low-V) Area Gap
3. Area/Power Analysis Soft, Mixed and Hard Soft Mixed Hard (Low-V) Area Gap 20X – 23X smaller Average Speed Gap 1X 5X – 6X faster Power Gap 9X – 11X (15X) less

27 Soft, Mixed and Hard Area [65 nm] 64-node NoC on Stratix III Soft
3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Soft Mixed Hard ~12,500 LBs 576 LBs 448 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 166 MHz 730 – 940 MHz Speed Bisection BW ~ 10 GB/s ~ 50 GB/s

28 Soft, Mixed and Hard Provides ~50GB/s peak bisection bandwidth
3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes Soft Mixed Hard (Low-V) ~12,500 LBs 576 LBs 448 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 166 MHz 730 – 940 MHz Speed Bisection BW ~ 10 GB/s ~ 50 GB/s

29 Typical FPGA Dynamic Power
NoC Power Budget 3. Area/Power Analysis 250 GB/s total bandwidth Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V) 123% How much is used for system-level communication? 17.4 W Largest Stratix-III device Typical FPGA Dynamic Power

30 Typical FPGA Dynamic Power
NoC Power Budget 3. Area/Power Analysis 250 GB/s total bandwidth Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V) 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

31 Typical FPGA Dynamic Power
NoC Power Budget 3. Area/Power Analysis 250 GB/s total bandwidth Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V) 123% 15% 11% NoC 17.4 W Typical FPGA Dynamic Power

32 Typical FPGA Dynamic Power
NoC Power Budget 3. Area/Power Analysis 250 GB/s total bandwidth Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V) 123% 15% 11% 7% NoC 17.4 W Typical FPGA Dynamic Power

33 Bandwidth in Perspective
3. Area/Power Analysis Bandwidth in Perspective DDR3  Module 1 PCIe  Module 2 14.6 GB/s Full theoretical BW 17 GB/s Cross whole chip! Aggregate Bandwidth 126 GB/s NoC Power Budget 3.5%

34 Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3
Comparison Against Buses Area/Power Efficiency Design Effort

35 DDR3: Qsys Bus vs. NoC Qsys bus: Build logical bus from fabric
4. Comparison DDR3: Qsys Bus vs. NoC Qsys bus: Build logical bus from fabric Embedded NoC: 16 Nodes, hard routers & links

36 4. Comparison DDR3: Qsys Bus vs. NoC “The Case for Embedded Networks-on-Chip on FPGAs” To appear in IEEE Micro Magazine (February) Qsys bus: Build logical bus from fabric Embedded NoC: 16 Nodes, hard routers & links

37 Design Effort Steps to close timing using Qsys 4. Comparison close
FPGA Steps to close timing using Qsys

38 4. Comparison Design Effort far Steps to close timing using Qsys FPGA

39 Timing closure can be simplified with an embedded NoC
4. Comparison Design Effort far Steps to close timing using Qsys FPGA Timing closure can be simplified with an embedded NoC

40 4. Comparison Area Comparison

41 4. Comparison Area Comparison

42 Area Comparison 4. Comparison
Entire NoC smaller than bus for 3 modules!

43 1/8 Hard NoC BW used  already less area for most systems
4. Comparison Area Comparison 1/8 Hard NoC BW used  already less area for most systems

44 Hard NoC saves power for even the simplest systems
4. Comparison Power Comparison Hard NoC saves power for even the simplest systems

45 Big city needs freeways to handle traffic
1 Why NoCs on FPGAs? Big city needs freeways to handle traffic 2 Embedded NoCs: Mixed & Hard Area: 20-23X Speed: 5-6X Power: 9-15X Area Budget for 64 nodes: ~1% Power Budget for 100 GB/s: 3-7% 3 Comparison Against P2P/Buses Raw efficiency close to simplest P2P links NoC more efficient & lower design effort.

46 Thank You!

47


Download ppt "Augmenting FPGAs with Embedded Networks-on-Chip"

Similar presentations


Ads by Google