Download presentation
Presentation is loading. Please wait.
1
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University
2
Spring 2003 Texas A&M Topics of this Lecture Motivations for a new communication template Methodology for Synthesis Clustering - PreliminaryAnalysis Network Architecture NoCSIM - Basic Simulator on Work Core Network Interface Implementation of NoCs Future Work and Conclusion
3
Spring 2003 Texas A&M Motivation for a New Template Throughput –Networking example OC768 Networking Standard Input data rate of 40Gbps or 114 Million packets per second (48 byte packets) Upto 70 memory accesses per packet for classification Nearly 7G memory accesses per second (64 bits) Shared bus needs to run at tens of GHz We need scalable and high performance on-chip communication architectures
4
Spring 2003 Texas A&M A Commercial example TNETV3010
5
Spring 2003 Texas A&M Current Trend Explicitly parallel SoC architectures Integrating huge amounts of Memory in chip designs Distributed Shared Memory Environments Support GALS for Power and Performance. Should allow Interconnection centric design flow and better predictability –Physical design Closure –Wire delay dominates gate delay
6
Spring 2003 Texas A&M Wire Delay vs Generation
7
Spring 2003 Texas A&M Motivation for Communication Synthesis SoCs are application specific Synthesis should be based on the application’s requirements –2.5 X Improvement based on architecture –6 X Improvement based on traffic. “EFFECTIVE SYNTHESIS METHODOLOGY IS NEEDED” [1]
8
Spring 2003 Texas A&M Network on Chip Architecture Packet switched Network for Communication concurrency and throughput Regular layout with small wire lengths Reduced Wire, Area and Power Complexity –Mesh –Torus Reduces number of hops.
9
Spring 2003 Texas A&M Weird Folded Torus Wire latency is distributed evenly
10
Spring 2003 Texas A&M Architecture Template Each node could be a cluster of cores.
11
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template Methodology for Synthesis Clustering - Analysis Network Architecture NoCSIM - Simulator Core Network Interface Initial Implementation of NoCs Future Work and Conclusion
12
Spring 2003 Texas A&M Overall Synthesis Methodology Outputs Synthesis Inputs Hardware/ Protocol Library System Specification/ Process Profile Clustering Intra-Cluster Communication Synthesis Profile Annotation Inter-Cluster / Core Communication Synthesis Mapping Partitioned & Synthesized System
13
Spring 2003 Texas A&M Topics of this Lecture Motivations for a new communication template Methodology for Synthesis Clustering - PreliminaryAnalysis Network Architecture NoCSIM - Basic Simulator on Work Core Network Interface Implementation of NoCs Future Work and Conclusion
14
Spring 2003 Texas A&M Clustering: Motivations Low Latency, Low Bandwidth & Synchronous Communications High Connectivity - Low Fanout Fixed Interfaces & Ease of Design Hierarchical approach to Communication architecture Exploration Estimation of Costs ahead of time
15
Spring 2003 Texas A&M Clustering Flow
16
Spring 2003 Texas A&M Clustering Algorithm Inputs –Resource & Connection Constraints Core type, area, Latency, Throughput, traffic type etc. –Implementation Constraints Parameters of a process technology like wire-pitch, R,C values of interconnects, buffers etc Operating frequencies, Signaling overheads –User defined Constraints Binary variable for control of designer on convergence of clustering
17
Spring 2003 Texas A&M Clustering Algorithm (Cont.) Simulated Annealing Initialize Move –Remove from Cluster and Add to Cluster Parameters –Cost function – A linear function of Wire complexity, Latency and Data-rate satisfactions with high biases and Bandwidth and Area deviations with small biases
18
Spring 2003 Texas A&M Simulated Annealing Convergence
19
Spring 2003 Texas A&M Analysis Models [2]
20
Spring 2003 Texas A&M Architecture Template
21
Spring 2003 Texas A&M Analysis Models (Cont.) Hop Latency
22
Spring 2003 Texas A&M Results from Clustering Clock Power decrease Interconnect Power decrease Interconnect Area increase
23
Spring 2003 Texas A&M Results from Clustering Causes : Hop latencies and lack of user constraints
24
Spring 2003 Texas A&M Synthesis and Verification Methodology
25
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion
26
Spring 2003 Texas A&M Network Architecture Packet Switched - Lower latency for less correlated communications Virtual Channel Flow-control - Bandwidth guarantees and higher saturation throughput Source based Routing - Simple and Fast Credit based traffic flow control - Reliable delivery
27
Spring 2003 Texas A&M Architecture Template
28
Spring 2003 Texas A&M Packet Format Type: Head, Data, Tail and Complete VCID: Virtual Channel Identifier Route: ‘N’ bit route field with last 2 bits specifying the Route to be used in the next controller 00 - Left 01 - Right 10 - Straight 11 - Extract Data: Actual Data field
29
Spring 2003 Texas A&M Routing Example
30
Spring 2003 Texas A&M Network working - Output Controller
31
Spring 2003 Texas A&M Network working -Input Controller
32
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion
33
Spring 2003 Texas A&M NoCSIM - SystemC based Network Simulator SystemC –SystemC Ports, Signals, Clocks, Processes and Modules 1000X Faster than RTL System Level Exploration for Architectural Synthesis
34
Spring 2003 Texas A&M Simulator Features Flow Control – Dynamic & Static Routing – Source-based, Dynamic & Multicast Switching – Packet switched Topology – K-ary-n-cube & Arbitrary topological extensions
35
Spring 2003 Texas A&M Class Hierarchy
36
Spring 2003 Texas A&M Simulation Results
37
Spring 2003 Texas A&M Simulation Results (cont.)
38
Spring 2003 Texas A&M Simulation Results (cont.)
39
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - A Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion
40
Spring 2003 Texas A&M Core Network Interface Two implementations
41
Spring 2003 Texas A&M Interface Estimations Type of Implementation AreaLatencyComplexityFlexibility RTL (HW) implementation (on Xtensa core) Additional register and logic to packetize 2 Cycles Additional registers and logic and an increase in instruction set. Requires programmable cores or development of modified cores. Wrapper RTL (HW) Implementation (off core) Additional control, registers and logic to packetize. 3 Cycles expected Additional control, registers and logic. Ability to understand core operation Can use existing cores. Modify wrappers for plug-and-play into different networks.
42
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - An analysis phase Network Architecture and Working NoCSIM - A Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion
43
Spring 2003 Texas A&M Gate Level Implementation of NoCs Synthesis Library Used –.18 u library –Vdd = 1.62v –Synopsys Design Compiler, Power Compiler –No Dynamic power annotations
44
Spring 2003 Texas A&M Gate Level Implementation of NoCs (cont.)
45
Spring 2003 Texas A&M Gate Level Implementation of NoCs (cont.) ModuleArea(128)Area (256) Output Controller78800154100 Input Controller470000610000 VCs, BuffersArea estimate in square microns 1,8338000 2,4354000 2,8431000 4,4470000
46
Spring 2003 Texas A&M Performance Evaluation A Configuration of 6 buffers and 4 virtual channels with 16 such tiles(N/W logic Clock = 400Mhz) can –Support aggregate data-rates of 600Gbps –Area cost around assuming a 3cm*3cm chip size will be around 5% –Dissipate around 1.6 W
47
Spring 2003 Texas A&M A Roadmap Area Cost will Decrease with process Latency will be dominated by wire delay Power reductions will become more important MOVE WILL BE TOWARDS MORE CLUSTERS AND CONCURRENCY
48
Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work
49
Spring 2003 Texas A&M Future Work Network Architecture –Application Layer Wrapper design, protocols and services –Logical Network Layer Flow-control, routing, switching and topologies –Physical Network Layer Signaling, Wire characterization and Prediction System Level Design and Design Flow –Power Management of Network –Integration to Codesign Environment Network aware partitioning and mapping Integration with Clustering phase
50
Spring 2003 Texas A&M References [1] N. Swaminathan, MS Thesis, Texas A&M Univ, Summer 2002. [1] K. Lahiri et. Al, “Fast performance Analysis of bus based on-chip communication architecture”, in Proc. of ICCAD, 1999. [2] A.Hemani et. al, “Lowering Power Consumption in clock by using Globally Asynchronous Design Style”, in Proc. of DAC, USA, 1999, pp 873-879.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.