Presentation is loading. Please wait.

Presentation is loading. Please wait.

NoC: Network OR Chip? Israel Cidon Technion.

Similar presentations


Presentation on theme: "NoC: Network OR Chip? Israel Cidon Technion."— Presentation transcript:

1 NoC: Network OR Chip? Israel Cidon Technion

2 Technion’s NoC Research:
PIs Israel Cidon (networking) Ran Ginosar (VLSI) Idit Keidar (Dist. Systems) Avinoam Kolodny (VLSI) Students: Evgeny Bolotin, Reuven Dobkin, Zvika Guz, Arkadiy Morgenshtein, Zigi Walter Roman Gindin

3 Origins of the NoC concept
Early publications: Guerrier and Greiner (2000) – “A generic architecture for on-chip packet-switched interconnections” Hemani, Jantsch, Kumar, Postula, Oberg ,Millberg and Lindqvist (2000) – “Network on chip: An architecture for billion transistor era” Dally and Towles (2001) – “Route packets, not wires: on-chip interconnection networks” Wingard (2001) – “MicroNetwork-based integration of SoCs” Rijpkema, Goossens and Wielage (2001) – “A router architecture for networks on silicon” De Micheli and Benini (2002) – “Networks on chip: A new paradigm for systems on chip design” Bolotin, Cidon Ginosar and Kolodny (2004) – “QNoC: QoS architecture and design process for network on chip”

4 Evolution or Paradigm Shift?
Network link Network router Computing module Bus Architectural paradigm shift Replace wire spaghetti by an intelligent network infrastructure Design paradigm shift Busses and signals replaced by packets Organizational paradigm shift Create a new discipline, a new infrastructure responsibility

5 Characteristics of a paradigm shift
successful Characteristics of a paradigm shift Addresses a critical and topical need Enables a quantum leap in productivity and application Resistance from legacy experts Requires a major change of mindset and skills! Think: Networking not Bus evolution!

6 Critical needs addressed by NoC
1) Efficient interconnect: delay, power, noise, scalability, reliability Module 2) Increase system integration productivity 3) Enable Chip Multi Processors

7 NoC offers Area and Power Scalability
For Same Performance, compare the Wire-area and power: NoC: Simple Bus: Point-to Point: Segmented Bus: E. Bolotin at al. , “Cost Considerations in Network on Chip”, Integration, special issue on Network on Chip, October 2004

8 4 Decades of Network 101 Evolved from busses and p-t-p connections
Extensive architectures, modeling and analysis research Architecture is about optimizing network costs Different goals and element costs => different architectures: Local Area Networks (LANs) Metropolitan Area Networks (MANs) System interconnect networks (SAN, InfiniBand …) WAN (TCP/IP, ATM…) Wireless networks Cross layered design Early architecture standardization is an optimization burden!

9 4 Decades of Network 101

10 Local Area Networks (LANs)
Critical need Distributing operations and sharing of heterogeneous systems Constraints Standardization Main Cost Incremental cost (NICs, wiring) Typical optimized architecture: Low cost hubs/switches Tree like architecture Exploit low cost local BW Shared media Broadcast Host embedded NICs

11 System interconnect (SAN, InfiniBand)
Critical need Create a powerful specialized system from low cost units Constraints Low latency Main Cost Total system cost per MIP Typical architecture: Wormhole/cut through Connection based Over-provisioned network High degree/regular topology Specific optimizations (e.g. RDMA)

12 WAN (TCP/IP, ATM…) Critical need Constraints Main Cost
Global application networking (collaboration, WWW, file sharing, voice) Constraints Scalability Heterogeneous user and application QoS requirements Main Cost Physical infrastructure (mainly long distance trunks) Typical architecture of choice: Packet switching Irregular, small degree networks of high speed trunks Optimization of topology and link capacities

13 CAN optimization The main cost(s) The design envelope (constraints)
Collection of designs supported by a given chip Convex hull of traffic requirements all configurations QoS constraints Other requirements (eg: design automation…) The main cost(s) Total Area Power Others Design time, verification and testability, Optimization variables Switching mechanism QoS Topology (incl. links capacities) Routing Flow and congestion control Buffering Application support …..

14 General purpose computer
One NoC does not fit all! Reconfiguration rate during run time CMP ASSP FPGA at boot time at design time ASIC Flexibility single application General purpose computer I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006

15 General purpose computer
One NoC does not fit all! Traffic Unpredictability Run time CMP ASSP FPGA At configuration At design time ASIC Flexibility single application General purpose computer A large solution range! I. Cidon and K. Goossens, in “Networks on Chips” , G. De Micheli and L. Benini, Morgan Kaufmann, 2006

16 Apply paradigm to ASIC based NoC
Design envelop / constraints Well define inter-modules traffic Automatic synthesis Variable QoS requirement Main cost Power and area Architecture of choice: Wormhole or small frame switching Small # of buffers, VCs, tables Simple QoS mechanisms (which?) Topology and routing optimized for cost

17 Example: QNoC Quality-of-service NoC architecture for ASICs
Traffic requirements are known a-priori Overall approach Wormhole switching QoS based on priority classes Small buffer/VC budget In-order SP XY routing Irregular topology Optimized link capacities (0,2) (0,0) (1,0) (0,3) (1,4) (0,4) (2,1) (2,0) (2,2) (2,3) (2,4) (4,3) (3,4) (4,4) R (5,0) * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NoC, 2004.

18 Quality-of-Service in QNoC
Multiple priority classes Define latency Preemptive Possible ASIC classes Signaling Real Time Stream Read-Write DMA Block Transfer Statistical guarantees E.g. <0.01% arrive later then required N T * E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny., “QNoC: QoS architecture and design process for Network on Chip”, JSA special issue on NOC, 2004.

19 QNoC Design Flow Extract inter-module traffic Place modules
Allocate link capacities Verify QoS and cost

20 QNoC Design Flow Extract inter-module traffic Place modules
Allocate link capacities R R R R R R R Module Module Verify QoS and cost

21 QNoC Design Flow Extract inter-module traffic Place modules
Allocate link capacities Verify QoS and cost Optimize capacity for performance/power tradeoff Capacity allocation is a traditional WAN optimization problem, however:

22 Wormhole Delay Modeling
Approximate delay analysis in wormhole networks Multiple Virtual-Channels Different link capacities Different communication demands Queuing delay: Flit interleaving delay approximation: * I. Walter, Z. Guz, I. Cidon, R. Ginosar and A. Kolodny, “Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip,” DATE 2006.  

23 The Capacity Allocation Problem
Given: system topology and routing Each flow’s bandwidth (fi ) and delay bound (TiREQ) Minimize total link capacity Such that:

24 Capacity Allocation – Realistic Example
A SoC-like system with realistic traffic demands and delay requirements “Classic” design: 41.8Gbit/sec Using the algorithm: 28.7Gbit/sec Total capacity reduced by 30% 00 01 02 03 10 11 12 13 20 21 22 23 Before optimization After optimization

25 Optimizing routing on Irregular Mesh
Around the Block Dead End Goal: Minimize the total size of routing tables E. Bolotin, I. Cidon, R. Ginosar and A. Kolodny, "Routing Table Minimization for Irregular Mesh NoCs", DATE 2007.

26 Saving Table Hardware Traditional solutions - full routing tables
Destination Based Routing - at router Source Routing – at sources Solution idea: Use Reduced Tables Store only relevant destinations (PLA) Default function (“Go XY” or “Don’t turn”) + Table for deviations

27 Routing Heuristics for Irregular Mesh
Distributed Routing (full tables) X-Y Routing with Deviation Tables Source Routing Source Routing for Deviation Points Random problem instances Systems with real applications

28 Efficient Routing Results
Scaling of Savings Savings Network Size

29 NoC for Shared Memory CMP
Constraints Multiple access to coherent cache Unpredictable traffic pattern QoS requirements (fetch, pre-fetch) Main cost CMP power / area per performance Architecture of choice: Tailored for a given CMP In-order/adaptive routing? Simple QoS mechanisms? Regular topology? is CMP symmetric? Built in support functions (multicast, search…)

30 NoC can facilitate critical transactions
* E.Bolotin, Z. Guz, I.Cidon, R. Ginosar and A. Kolodny, “The Power of Priority: NoC based Distributed Cache Coherency”, NoCs 2007.

31 Priority NoC: Results

32 NoC Based FPGA Architecture
Functional unit NoC for inter-routing Routers Configurable region – User logic (1) future FPGA will be NoC-based and (2) the design will be 2-tiered, or hierarchical. Configurable network interface

33 NoC for FPGA Design envelope / constraints
Many ASIC like applications for a given FPGA Hard NoC infrastructure – efficient but inflexible Soft logic is reusable but has inferior performance Average NoC cost of most demanding designs Hard grid links and router logic Total configured NoC Logic used Architecture of choice: Regular and uniform grid In-order/load balanced routing Hard logic for links, routers Soft logic for routing algorithms, headers, CNIs Soft NoC tuning (routing, CNI) for a given implementation

34 NoC Based FPGA Architecture
Functional unit NoC for inter-routing Routers Configurable region – User logic (1) future FPGA will be NoC-based and (2) the design will be 2-tiered, or hierarchical. Configurable network interface

35 Source Toggle XY Unlike TXY, traffic to same destination is not split
Maximum capacity similar to TXY The route is a bitwise XOR of source and destination ID Can be extended to weighted source toggle (WOT)

36 Design Envelope for various distances between the hotspots for WOT
Two Hotspots Design Envelope for various distances between the hotspots for WOT Maximum Capacity

37 Generic NoC Problems Many shared problems across design spectrum, examples: Need for a low latency class of service Verification and predictability Power control of NoCs Centralized vs. distributed control Is single NoC enough per chip? Bus examples suggest otherwise Hot modules slows incoming NoC traffic Off chip systems Shared memory subsystems Expensive functional units

38 NoC clogging by hot modules
IP1 IP2 Interface Interface Interface IP3 HM is not a local problem Transparent to NoC performance Walter, Cidon, Ginosar and Kolodny, ”Access Regulation to Hot-Modules in Wormhole NoCs”, NOCS 2007.

39 Source Fairness IP (HM)
Interface No “fairness” is guarantied since routers’ arbitration is based on local state The further is the source from the destination, its worm has to win more arbitrations The HM module bandwidth isn’t fairly shared

40 Hot Module Distributed Arbitration
Control is distributed or centralized Centralized control can account for dependencies Requests and grants are sent at high service level Requests and grants includes additional data as needed requested quota, source queue size, priority, deadline, etc. Granted quota, scheduling of transmission's, etc. Initial credits hides light load request-grant latency Emphasis: Bypassing (blocked) data packets!

41 Hot vs. non-Hot ModuleTraffic
HM Traffic With Control Other Traffic With Control HM Traffic Without Control Other Traffic Without Control

42 NoC: A Network AND A Chip
Conclusions NoC is a chip design paradigm shift Introduces many diverse and new networking challenges No killer NoC for all chips Should not comply with any X-AN concept May include centralized mechanisms May involve more than one NoC/Bus mechanisms May combine several communication methodologies Low latency NoC/Bus for metadata and urgent signals Beware of early standardization and legacy barriers Mutual benefit for VLSI-Networking collaboration NoC: A Network AND A Chip


Download ppt "NoC: Network OR Chip? Israel Cidon Technion."

Similar presentations


Ads by Google