Download presentation
Presentation is loading. Please wait.
Published byCameron Leonard Modified over 9 years ago
1
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams
2
Introduction Continued shrinking of device dimension introduces new design challenges Moving data around a chip can now be the limiting factor of performance Existing interconnection solutions do not scale well 2
3
Why do existing solutions not scale? Global connections are longer Wire depth increased to counter width decrease Parasitic capacitive effects increase and cause slow signal propagation 3
4
Why do existing solutions not scale? Existing system-level connection uses buses Buses increase resource efficiency and decrease wiring congestion Not suitable for a large number of modules A network based alternative would offer higher aggregate bandwidth 4
5
Why design for FPGA systems? FPGA silicon area already dominated by wiring Global wires are limited in number Increasing gate count only increases wiring congestion 5
6
The Solution: Network-on-Chip Use technologies from network systems Replace inefficient global wiring with high-level interconnection network Create scalable systems to handle large numbers of modules 6
7
Existing Solutions Most existing systems are for ASIC designs Stanford Interconnect RAW SCALE SPIN PNoC: An solution for FPGAs Complex High hardware cost Other simulated solutions exist but few are implemented 7
8
Proposal: Two network systems Existing solutions use either packet switching or circuit switching techniques Design, implement, test and synthesise one of each to compare performance and hardware cost Map solutions to an FPGA platform to evaluate hardware cost in current generation systems 8
9
Network Architecture Design Topology Simple Scalable 2 Dimensional Solution: 2D mesh Topology 9
10
Network Architecture Design Routing Algorithm Deterministic Data always follows same path through network Simple hardware Sensitive to congestion Adaptive Paths through network can change according to load Complex hardware Avoids congestion 10
11
Network Architecture Design When choosing routing algorithms must avoid: Deadlock: Livelock Solution: Use unidirectional wiring and allow each node to make two connections Solution: Use deterministic routing 11
12
Network Architecture Design Flow control methods Circuit switched Circuit request propagates through network Path reserved to destination Grant signal propagates back Data sent then circuit deallocated Packet switched Use header, body and tail Wormhole routing Forward header and body without waiting for tail Need buffers to store stalled packets 12
13
Router Design Each router contains a number of modules FIFOs (only present in packet switched router) Address to port-request decoder Arbiter Control finite state machines Crossbar 13
14
Circuit Switched Router Structure Request In Request Out Grant In Grant Out Data In Data Out Data In In & Out Ports CrossbarCrossbar FSMFSM ArbiterArbiter Address to Port Decoder 14
15
Packet Switched Router Structure Request From FIFOs Request In Write Out Full In Grant Out Data From FIFOs Data Out Data From FIFOs In & Out Ports CrossbarCrossbar ControlControl ArbiterArbiter Address to Port Decoder FIFO FSM Data In Full Write Grant Req Data 15 5 Queue Modules
16
Router Implementation and Testing Both routers were coded using VHDL Simulation and testing used a combination of ModelSim and Xilinx ISE 9.1 Ad-hoc tests used for individual modules VHDL testbench used for system verification 16
17
Testbench Structure Mesh Network Read Input Read Input Input Tables Test Table Source Output Table Sink Compare TESTBENCH Command File Output File Clock Gen Reset Gen Cycle Count Success: ID: 1 Source : (0,3) Dest : (1,0) Hops : 4 Latency: 34 Success: ID: 2 Source : (0,2) Dest : (1,0) Hops : 3 Latency: 27 Success: ID: 3 Source : (3,2) Dest : (1,1) Hops : 3 Latency: 22 Success: ID: 4 Source : (1,3) Dest : (0,1) Hops : 3 Latency: 22 Success: ID: 5 Source : (3,0) Dest : (3,1) Hops : 1 Latency: 12 #STARTSOURCEDESTSIZEID # ------------------------------------------------------ 2 3 0 0 1 8 1 3 2 0 0 1 2 2 3 2 3 1 1 2 3 4 3 1 1 0 8 4 5 0 3 1 3 7 5 17
18
Synthesis Each router was synthesised for a Virtex-4 LX platform Post-synthesis verification Resource usage Timing 18
19
Circuit Switched Resource Usage LUTs Flip-Flops Total of 586 4 Input LUTS ~0.1% of a Virtex 5 Total of 202 Flip Flops 19
20
Packet Switched Resource Usage LUTs Flip-Flops Total of 786 4 Input LUTS +34% compared to circuit switched Total of 237Flip Flops 20
21
Timing Results Circuit Switched Packet Switched Max Freq 126.330MHz Setup time 5.308ns Hold time 0.272ns Max Freq 144.533MHz Setup time 6.125ns Hold time 0.272ns Critical path is through Arbiter in both designs 21
22
Project Appraisal Maintaining an accurate software simulation proved difficult A great deal was learnt during the implementation of the circuit switched network HDL implementations are only prototypes Testbench provides a good framework but more time is needed to gather performance data 22
23
Conclusions Possible to make low complexity network-on-chip systems suitable for FPGAs Latency has to be traded for throughput Hard to collect performance data without application driven benchmarks Both networks are viable so why not use both? 23
24
Future Work Cycle accurate software simulations Application driven benchmarking Serial transmission Power efficiency Industry standard solution 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.