Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 11 Ethernet Evolution: Fast and Gigabit Ethernet.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Augmenting FPGAs with Embedded Networks-on-Chip
Chapter 1: Introduction to Scaling Networks
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Subtraction: Adding UP
Equal or Not. Equal or Not
Slippery Slope
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis Comparison Against P2P/Buses 4 4.
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
Mohamed Abdelfattah Vaughn Betz
Presentation transcript:

Mohamed ABDELFATTAH Vaughn BETZ

2 Why NoCs on FPGAs? Embedded NoCs Power Analysis

Interconnect 3 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires

4 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor

5 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Interfaces DDR/PCIe.. Hard Interfaces DDR/PCIe.. Interconnect still the same Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor 1600 MHz 200 MHz 800 MHz

6 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet 1600 MHz 200 MHz 800 MHz

7 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet

BarcelonaLos Angeles Keep the roads, but add freeways. Hard Blocks Logic Cluster Source: Google Earth

9 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated NoC RoutersLinks Router forwards data packet Router moves data to local interconnect

10 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect Pre-design NoC to requirements NoC links are re-usable NoC is heavily pipelined NoC abstraction favors modularity High bandwidth endpoints known

11 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Latency-tolerant communication NoC abstraction favors modularity Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect Previous work: Compelling area efficiency and performance NoCs can simplify FPGA design Does the NoC abstraction come at a high power cost?

12 Why NoCs on FPGAs? Embedded NoCs Power Analysis Mixed NoCs Hard NoCs

2. Embedded NoCs Mixed NoC Hard NoC Soft LinksHard Routers Hard LinksHard Routers = + + = Soft NoCSoft LinksSoft Routers + =

14 Soft Hard FPGA CAD Tools ASIC CAD Tools Design Compiler Area Speed Power? Power Toggle rates Gate-level simulation Mixed HSPICE

FPGA Router Embedded NoCs Logic blocks Baseline Router Programmable soft interconnect WidthVCsPortsBuffer /VC Mixed NoCSoft LinksHard Routers + =

FPGA Router Embedded NoCs 16 Mixed NoCSoft LinksHard Routers + =

Router 17 Assumed a mesh Can form any topology FPGA 2. Embedded NoCs Special Feature Configurable topology

FPGA Router Embedded NoCs Logic blocksDedicated hard interconnectProgrammable soft interconnect 18 Hard NoCHard LinksHard Routers + =

FPGA Router Embedded NoCs 19 Hard NoCHard LinksHard Routers + =

FPGA Router Embedded NoCs Low-V mode 1.1 V 0.9 V Save 33% Dynamic Power Special Feature ~15% slower 20 Hard NoCHard LinksHard Routers + =

21 Why NoCs on FPGAs? Embedded NoCs Power Analysis Components Analysis 3 3 System Analysis

22 Area Gap Speed Gap Power Gap Mixed Hard (Low-V) Soft 20X – 23X smaller 5X – 6X faster 9X11X (15X) Speed Area Speed Bisection BW 1. Power-aware design 2. NoC power budget 3. Comparison ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s Average 64 – NoC 1X Investigate BW and power together

Total BW = 250 GBps Most Efficient NoC? Power Analysis Links Power Routers Power Wider Links, Fewer Routers

Total BW = 250 GBps Most Efficient NoC? Power Analysis

Total BW = 250 GBps Most Efficient NoC? Power Analysis

26 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W 250 GB/s total bandwidth Typical FPGA Dynamic Power 3. Power Analysis 123% How much is used for system-level communication?

27 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W NoC 250 GB/s total bandwidth 15% Typical FPGA Dynamic Power 3. Power Analysis 123%

28 3. Power Analysis NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11%

29 3. Power Analysis NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 7%

GB/s 17 GB/s DDR3 Module 1 PCIe Module 2 Full theoretical BW 126 GB/s Aggregate Bandwidth 3.5% NoC Power Budget Cross whole chip! 3. Power Analysis

31 11 Point-to-point Links Broadcast 11 n Multiple Masters 1 1 Mux + Arbiter n Multiple Masters, Multiple Slaves 1 1 Mux + Arbiter n n Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC 1.. n Compare wires interconnect to NoCs 3. Power Analysis

32 Hard and Mixed NoCs very compelling Length of 1 NoC Link 1 % area overhead on Stratix 5 Runs at MHz Power on-par with simplest FPGA interconnect 3. Power Analysis 200 MHz High Performance / Packet Switched

Big city needs freeways to handle traffic Area: 20-23X Why NoCs on FPGAs? Embedded NoCs: Mixed & Hard Power Analysis Speed: 5-6XPower: 9-15X Power-aware design of embedded NoCs Power Budget for 100 GB/s: 3-7% Point-to-point soft Links: 4.7 mJ/GB Embedded NoCs: 4.5 – 10.4 mJ/GB

34 eecg.utoronto.ca/~mohamed/noc_designer.html

35 eecg.utoronto.ca/~mohamed/noc_designer.html