1 1 Networks on Chips (NoC) – Keeping up with Rent’s Rule and Moore’s Law Avi Kolodny Technion – Israel Institute of Technology International Workshop.

Slides:



Advertisements
Similar presentations
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Advertisements

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
G-Number 1 Challenges and Opportunities for Technion and the Israeli Chip Industry Avinoam Kolodny EE Department Technion – Israel Institute of Technology.
Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Technion – Israel Institute of Technology Qualcomm Corp. Research and Development, San Diego, California Leveraging Application-Level Requirements in the.
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
Network-on-Chip An Overview System-on-Chip Group, CSE-IMM, DTU.
1 Evgeny Bolotin – Efficient Routing, DATE 2007 Routing Table Minimization for Irregular Mesh NoCs Evgeny Bolotin, Israel Cidon, Ran Ginosar, Avinoam Kolodny.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
MICRO-MODEM RELIABILITY SOLUTION FOR NOC COMMUNICATIONS Arkadiy Morgenshtein, Evgeny Bolotin, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
NoC: Network OR Chip? Israel Cidon Technion. Israel Cidon, Technion Technion’s NoC Research: PIs  Israel Cidon (networking)  Ran Ginosar (VLSI)  Idit.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
Network-on-Chip Links and Implementation Issues System-on-Chip Group, CSE-IMM, DTU.
1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Issues in System-Level Direct Networks Jason D. Bakos.
G-Number 1 Challenges and Opportunities for Technion and the Israeli Chip Industry Avinoam Kolodny EE Department Technion – Israel Institute of Technology.
1 Modeling and Optimization of VLSI Interconnect Lecture 1: Introduction Avinoam Kolodny Konstantin Moiseev.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Networks-on-Chip. Seminar contents  The Premises  Homogenous and Heterogeneous Systems- on-Chip and their interconnection networks  The Network-on-Chip.
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Networks-on-Chip Ben Abdallah Abderazek The University of Aizu,
Design, Synthesis and Test of Network on Chips
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Network-on-Chip (1/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.
Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Group Research QNoC.
Lecture 16: Router Design
Module R R RRR R RRRRR RR R R R R Access Regulation to Hot-Modules in Wormhole NoCs Isask’har (Zigi) Walter Supervised by: Israel Cidon, Ran Ginosar and.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Technion – Israel Institute of Technology Faculty of Electrical Engineering NOC Seminar Error Handling in Wormhole Networks Author: Amit Berman Mentor:
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Lecture 23: Interconnection Networks
Azeddien M. Sllame, Amani Hasan Abdelkader
NoC: Network OR Chip? Israel Cidon Technion.
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Presentation transcript:

1 1 Networks on Chips (NoC) – Keeping up with Rent’s Rule and Moore’s Law Avi Kolodny Technion – Israel Institute of Technology International Workshop on System Level Interconnect Prediction (SLIP) March 2007

2 2 Acknowledgements  Research colleagues Israel Cidon, Ran Ginosar, Idit Keidar, Uri Weiser  Students: Evgeny Bolotin, Reuven Dobkin, Roman Gindin, Zvika Guz, Tomer Morad, Arkadiy Morgenshtein, Zigi Walter  Sponsors

3 3 Keeping up with Moore’s law: Principles for dealing with complexity:  Abstraction  Hierarchy  Regularity  Design Methodology Millions of Transistors per chip Source: S. Borkar, Intel Terminals per module 1 10 Transistors per module Rent’s exponent <1

4 4 NoC = More Regularity and Higher Abstraction From: Dedicated signal wires To: Shared network Computing Module Network switch Network link Module Similar to: - Road system - Telephone system

5 5 Module NoC essentials  Communication by packets of bits  Routing of packets through several hops, via switches  Parallelism  Efficient sharing of wires point-to-point link Switch (a.k.a. Router)

6 6 Origins of the NoC concept  The idea was talked about in the 90’s, but actual research came in the new Millenium.  Some well-known early publications:  Guerrier and Greiner (2000) – “ A generic architecture for on-chip packet-switched interconnections ”  Hemani et al. (2000) – “Network on chip: An architecture for billion transistor era”  Dally and Towles (2001) – “Route packets, not wires: on-chip interconnection networks”  Wingard (2001) – “MicroNetwork-based integration of SoCs”  Rijpkema, Goossens and Wielage (2001) – “A router architecture for networks on silicon”  Kumar et al. (2002) – “A Network on chip architecture and design methodology”  De Micheli and Benini (2002) – “Networks on chip: A new paradigm for systems on chip design”

7 7 From buses to networks Shared Bus B B Segmented Bus Original bus features: One transaction at a time Central Arbiter Limited bandwidth Synchronous Low cost

8 8 Original bus features: One transaction at a time Central Arbiter Limited bandwidth Synchronous Low cost Advanced bus Multi-Level Segmented Bus B B Segmented Bus New features: Versatile bus architectures Pipelining capability Burst transfer Split transactions Overlapped arbitration Transaction preemption and resumption Transaction reordering… B B

9 9 Evolution or Paradigm Shift? Computing module Network router Network link  Architectural paradigm shift  Replace the wire spaghetti by a network  Usage paradigm shift  Pack everything in packets  Organizational paradigm shift  Confiscate communications from logic designers  Create a new discipline, a new infrastructure responsibility  Already done for power grid, clock grid, … Bus

10 Past examples of paradigm shifts in VLSI Logic Synthesis From: Schematic entry To: HDLs and Cell libraries  Logic designers became programmers  Enabled ASIC industry and Fab-less companies  “System-on-Chip” The Microprocessor From: Hard-wired state machines To: Programmable chips  Created a new computer industry

11 Characteristics of a paradigm shift  Solves a critical problem (or several problems)  Step-up in abstraction  Design is affected:  Design becomes more restricted  New tools  The changes enable higher complexity and capacity  Jump in design productivity  Initially: skepticism. Finally: change of mindset! successful Let’s look at the problems addressed by NoC

12 Critical problems addressed by NoC 3) Chip Multi Processors (key to power-efficient computing) 1) Global interconnect design problem: delay, power, noise, scalability, reliability 2) System integration productivity problem

13 1(a): NoC and Global wire delay Long wire delay is dominated by Resistance Add repeaters Repeaters become latches (with clock frequency scaling) NoC router Latches evolve to NoC routers Source: W. Dally

14 1(b): Wire Design for NoC  NoC links:  Regular  Point-to-point (no fanout tree)  Can use transmission-line layout  Well-defined current return path  Can be optimized for noise / speed / power  Low swing, current mode, …. h wg w SIGNAL GROUND t tg h wg w SIGNAL GROUND t tg ws t d t wg tg GROUND h SS d

15 1(c): NoC Scalability For Same Performance, compare the wire-area cost of: NoC: Simple Bus: Segmented Bus: Point- to- Point: E. Bolotin at al., “Cost Considerations in Network on Chip”, Integration, special issue on Network on Chip, October 2004

16 1(d): NoC and communication reliability  Fault tolerance and error correction A. Morgenshtein, E. Bolotin, I. Cidon, A. Kolodny, R. Ginosar, “Micro-modem – reliability solution for NOC communications”, ICECS 2004

17 1(e): NoC and GALS  System modules may use different clocks  May use different voltages  NoC can take care of synchronization  NoC design may be asynchronous  No waste of power when the links and routers are idle

18 2: NoC and engineering productivity  NoC eliminates ad-hoc global wire engineering  NoC separates computation from communication  NoC supports modularity and reuse of cores  NoC is a platform for system integration, debugging and testing

19 3: NoC and CMP Inter- connect Gate Diff. Uniprocessor dynamic power (Magen et al., SLIP 2004) Die Area (or Power) Uniprocessor Performance “ Pollack’s rule” (F. Pollack. Micro 32, 1999)  Uniprocessors cannot provide Power-efficient performance growth  Interconnect dominates dynamic power  Global wire delay doesn’t scale  Instruction-level parallelism is limited  Power-efficiency requires many parallel local computations  Chip Multi Processors (CMP)  Thread-Level Parallelism (TLP)  Network is a natural choice for CMP!

20 Why Now is the time for NoC? Difficulty of DSM wire design Productivity pressure CMPs

21 Characteristics of a paradigm shift  Solves a critical problem (or several problems)  Step-up in abstraction  Design is affected:  Design becomes more restricted  New tools  The changes enable higher complexity and capacity  Jump in design productivity  Initially: skepticism. Finally: change of mindset! successful Now, let’s look at the Abstraction provided by NoC

22 Traffic model abstraction  Traffic model may be captured from actual traces of functional simulation  A statistical distribution is often assumed for messages LatencyPacket Size BWFlow 5nsec1Kb500Kb/s1414 12nsec3Kb1.5Mb/s4545 5nsec2Kb200Kb/s7  10 15nsec2Kb50Kb/s 1  11 22nsec1Kb50Kb/s 2  11 15nsec3Kb300Kb/s 3  11 22nsec5Kb1.5Mb/s 4  11 12nsec1Kb50Kb/s 5  11 22nsec1Kb300Kb/s 6  11 5nsec5Kb1.5Mb/s 7  11 12nsec1.5Kb50Kb/s 8  11 15nsec2Kb300Kb/s 9  11 12nsec3Kb1.5Mb/s 10  11 PE11 PE8PE9PE10 PE1PE2PE3PE4PE5 PE6PE7

23 Data abstraction Message Packet HeaderPayload Flit (Flow control digit) Type Dest. Phit (Physical unit) VC Type Body VC Type Tail VC

24 Layers of Abstraction in Network Modeling  Software layers  O/S, application  Network and transport layers  Network topology  Switching  Addressing  Routing  Quality of Service  Congestion control, end-to-end flow control  Data link layer  Flow control (handshake)  Handling of contention  Correction of transmission errors  Physical layer  Wires, drivers, receivers, repeaters, signaling, circuits,.. e.g. crossbar, ring, mesh, torus, fat tree,… Circuit / packet switching: SAF, VCT, wormhole e.g. guaranteed-throughput, best-effort Logical/physical, source/destination, flow, transaction Static/dynamic, distributed/source, deadlock avoidance Let’s skip a tutorial here, and look at an example

25 Architectural choices depend on system needs  A large design space for NoCs! Flexibility Reconfiguration rate single application General purpose computer at design time at boot time during run time ASIC CMP ASSP FPGA I. Cidon and K. Goossens, in “Networks on Chips”, G. De Micheli and L. Benini, Morgan Kaufmann, 2006

26 Example: QNoC Technion’s Quality-of-service NoC architecture  Application-Specific system (ASIC) assumed  ~10 to 100 IP cores  Traffic requirements are known a-priori  Overall approach  Packet switching  Best effort (“statistical guarantee”)  Quality of Service (priorities) Module R R R R RR R R R RRRRR RRRRR RRRRR R * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NoC, 2004.

27 Choice of generic network topology

28 Topology customization  Irregular mesh  Address = coordinates in the basic grid

29 Message routing path  Fixed shortest-path routing (X-Y) Simple Router No deadlock scenario No retransmission No reordering of messages Power-efficient

30 IP1 Interface IP2 Interface  Small number of buffers  Low latency Wormhole Switching

31 IP3 Interface IP2 Interface IP1 Interface  The “hot module” IP1 is not a local problem. Traffic destined elsewhere suffers too! The Green packet experiences a long delay even though it does NOT share any link with IP1 traffic Blocking issue

32 Statistical network delay  Some packets get more delay than others, because of blocking % of packets Time

33 Average delay depends on load

34 Quality-of-Service in QNoC  Multiple priority (service) levels  Define latency / throughput  Example:  Signaling  Real Time Stream  Read-Write  DMA Block Transfer  Preemptive  Best effort performance  E.g. 0.01% arrive later then required N T * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NOC, 2004.

35 Router structure Flits stored in input ports Output port schedules transmission of pending flits according to: Priority (Service Level) Buffer space in next router Round-Robin on input ports of same SL Preempt lower priority packets

36 Virtual Channels

37 QNoC router with multiple Virtual Channels } × 5 { ×5

38 Simulation Model  OPNET Models for QNoC  Any topology and traffic load  Statistical or trace-based traffic generation at source nodes

39 Simulation Results  Flit-accurate simulations Delay of high-priority service levels is not affected by load

40 Perspective 1: NoC vs. Bus  Aggregate bandwidth grows  Link speed unaffected by N  Concurrent spatial reuse  Pipelining is built-in  Distributed arbitration  Separate abstraction layers However:  No performance guarantee  Extra delay in routers  Area and power overhead?  Modules need network interface  Unfamiliar methodology  Bandwidth is limited, shared  Speed goes down as N grows  No concurrency  Pipelining is tough  Central arbitration  No layers of abstraction (communication and computation are coupled) However:  Fairly simple and familiar BusNoC

41 Perspective 2: NoC vs. Off-chip Networks  Sensitive to cost:  area  power  Wires are relatively cheap  Latency is critical  Traffic may be known a-priori  Design time specialization  Custom NoCs are possible  Cost is in the links  Latency is tolerable  Traffic/applications unknown  Changes at runtime  Adherence to networking standards Off-Chip NetworksNoC

42 NoC can provide system services Example: Distributed CMP cache * E.Bolotin, Z. Guz, I.Cidon, R. Ginosar and A. Kolodny, “The Power of Priority: NoC based Distributed Cache Coherency”, NoCs 2007.

43 Characteristics of a paradigm shift  Solves a critical problem (or several problems)  Step-up in abstraction  Design is affected:  Design becomes more restricted  New tools  The changes enable higher complexity and capacity  Jump in design productivity  Initially: skepticism. Finally: change of mindset! successful OK, what’s the impact of NoC on chip design?

44 VLSI CAD problems  Application mapping  Floorplanning / placement  Routing  Buffer sizing  Timing closure  Simulation  Testing

45 VLSI CAD problems reframed for NoC  Application mapping (map tasks to cores)  Floorplanning / placement (within the network)  Routing (of messages)  Buffer sizing (size of FIFO queues in the routers)  Timing closure (Link bandwidth capacity allocation)  Simulation (Network simulation, traffic/delay/power modeling)  Testing  … combined with problems of designing the NoC itself (topology synthesis, switching, virtual channels, arbitration, flow control,……) Let’s see a NoC-based design flow example

46 QNoC-based SoC design flow Place Modules Adjust link capacities Inter-module Traffic model Determine routing Trim routers / ports / links

47 Routing on Irregular Mesh Goal: Minimize the total size of routing tables required in the switches Around the Block Dead End E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny, "Routing Table Minimization for Irregular Mesh NoCs", DATE 2007.

48 Routing Heuristics for Irregular Mesh Distributed Routing (full tables) X-Y Routing with Deviation Tables Source Routing Source Routing for Deviation Points Systems with real applications Random problem instances

49 Timing closure in NoC Module R R RRR R RRRRR RR R R R R R R RRR R RRRRR RR R R R R Define inter- module traffic Place modules Increase link capacities  Too low capacity results in poor QoS  Too high capacity wastes power/area  Uniform link capacities are a waste in application-specific systems! Module R R RRR R RRRRR RR R R R R R R RRR R RRRRR RR R R R R QoS Satisfied?

50 Network Delay Modeling  Analysis of mean packet delay in wormhole network  Multiple Virtual-Channels  Different link capacities  Different communication demands Flit interleaving delay approximation: Queuing delay: * I. Walter, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip,” DATE 2006.

51 Capacity Allocation Problem  Given:  system topology and routing  Each flow’s bandwidth (f i ) and delay bound (T i REQ )  Minimize total link capacity  Such that:

52 State of the art: NoC is already here!  > 50 different NoC architecture proposals in the literature; 2 books; hundreds of papers since 2000  Companies use (try) it  Freescale, Philips, ST, Infineon, IBM, Intel, …  Companies sell it  Sonics (USA), Arteris (France), Silistix (UK), …  1st IEEE Conference: NOCS 2007  102 papers submitted

53 NoC research community  Academe and industry  VLSI / CAD people  Computer system architects  Interconnect experts  Asynchronous circuit experts  Networking/Telecomm experts

54 Possible impact: Expect new forms of Rent’s Rule?  View interconnection as transmission of messages over virtual wires (through the NoC)  Model system interconnections among blocks in terms of required bandwidth and timing  Dependence on NoC topology  Dependence on the S/W application (in a CMP)  Usage for prediction of hop-lengths, router design, …. D. Greenfield, A. Banerjee, J. Lee and S. Moore, “Implications of Rent's Rule for NoC Design and Its Fault-Tolerance”, NoCS 2007 (to appear) M.Y. Lanzerotti et al., “The work of E. F. Rent, with an application to on-chip interconnection requirements”, IBM J. Res. & Dev. Vol. 49 (2005).

55 Summary  NoC is a scalable platform for billion-transistor chips  Several driving forces behind it  Many open research questions  May change the way we structure and model VLSI systems