Download presentation
Published byMaurice Mason Modified over 9 years ago
1
Hardwired networks on chip for FPGAs and their applications
Kees Goossens (TU Delft, NXP) Muhammad Aqeel Wahlah (TU Delft) Kees Goossens (NXP, TUD) Muhammad Aqeel Wahlah (TUD)
2
overview applications network on chip FPGA key ideas hardwired NOC
unified interconnect data coercion / type casting application: dynamic partial reconfiguration multiple concurrent applications multiplex sub-applications (“hardware tasks”) example conclusions
3
applications BAC T1 T2 T3 C1 C2 C3 A1 A2 BA
task / function mapped on IP includes local storage / buffering application: set of communicating IPs / tasks / ... data, control, code communication via connections use case: set of concurrent applications a1 and b3 mapped on hard IP that is configurable (io proc) a2 and b1, b2 mapped on soft ip state mapped on hard RAM (not nec.) could have more intermediate states t1 mapped on soft ip t2 is dut (soft ip) t3 mapped on CPU
4
network on chip (NOC) connects ports on hardware blocks (IP)
data, control connections: virtual wires real-time / quality of service programmable at run-time set up & remove connections by programming control registers in the NOC styles of communication address-based / memory-mapped streaming T3 A1 A2 IP NI NI NOC BA IP IP NI R R NI T2 IP R NI BAC IP T1
5
FPGA fabric soft IP are configured in configurable elements (LUT)
IO processor LUT LUT soft IP are configured in configurable elements (LUT) and switch boxes (not shown) with a given configuration granularity (frame) using the configuration interconnect (ICAP) hard IP CPU on-chip memories (BRAM, ...) off-chip memory interfaces decryption IP etc. CPU LUT de/encrypt accelerator off-chip memory LUT on-chip memory LUT on-chip memory configuration: bitstream loading programming / control: set MMIO registers xilinx terminology (frames, ICAP, etc.) ICAP
6
application on FPGA design an application as for ASIC
processor LUT LUT A1 soft data interconnect soft control interconnect A2 design an application as for ASIC IPs, interconnect, storage, sw but map on soft & hard IP resources traditionally have separate soft data and control interconnects could also use soft NOC for both CPU frame de/encrypt accelerator off-chip memory BAC frame A1 BA A2 BAC on-chip memory BA frame on-chip memory ICAP
7
multiple applications on FPGA
processor LUT LUT A1 soft data interconnect soft control interconnect A2 interconnects and IPs of different applications share reconfiguration regions (frames) dynamic reconfiguration is global, not partial CPU T3 LUT de/encrypt accelerator T1 off-chip memory BAC LUT A1 BA A2 BAC on-chip memory BA T2 LUT T1 T2 T3 on-chip memory ICAP
8
overview application network on chip FPGA key ideas
hardwired NOC improved performance : cost unified interconnect flexibility data coercion / type casting cool (and useful) applications application: dynamic partial reconfiguration multiple concurrent applications multiplex sub-applications (“hardware tasks”) example conclusions
9
1. hardwired interconnect
IO processor hard interconnect(s) CFR A1 A2 replace soft interconnect(s) by hard interconnect(s) connect reconfifgurable regions of LUTs (CFR) bit-level reconfigurability (CFR) switch boxes transaction-level reconfigurability (NOC) routers, NIs memory mapped / streaming [Hecht FPL’05] CPU T3 CFR de/encrypt accelerator off-chip memory BAC CFR T1 on-chip memory BA CFR T2 on-chip memory ICAP
10
1. hardwired interconnect
IO processor hard interconnect(s) CFR c3 C1 ~35 X smaller area ~3.5 X higher speed ~150 X better perf:cost ratio (bits/sec/area) ~200 X smaller configuration footprint (program MMIO, no bitstream) ~200 X faster soft IP load & boot dynamic partial reconfiguration no constraints on soft IP placement due to communication loss of flexibility fewer LUTs CFR = frame 7% hard NOC [based on Virtex4 & Aethereal NOC, Goossens NOCS’08] C2 CPU T3 CFR de/encrypt accelerator off-chip memory BAC CFR T1 on-chip memory CFR T2 on-chip memory ICAP
11
performance & cost essentially, it all depends on
area soft:hard ≈ 35:1 speed soft:hard ≈ 3.5:1 configuration footprint of soft NOC (bitstream) : programming footprint of hard NOC (MMIO registers) ≈ 214:1 resulting in boot time soft:hard ≈ 1:200 functional performance:cost (bit/sec:area) soft:hard ≈ 1:147
12
performance & cost configuration speed
1.9 Gb/s for dedicated configuration interconnect (ICAP) 8 Gb/s for hard NOC programming speed 118 MHz soft NOC 500 MHz hard NOC configuration footprint for soft NOC 1.8 Mb (8300 LUTs per router+NI) programming footprint for hard NOC 2100 bit per connection thus to configure & program an NI 1 msec for soft NOC 10.6 μsec for hard NOC
13
single hard interconnect
2. unified interconnect IO processor single hard interconnect CFR A1 A2 one interconnect (e.g. NOC) for data for functional mode control for programming bitstreams for configuration dynamic partitioning of different interconnects CPU T3 CFR de/encrypt accelerator off-chip memory BAC CFR T1 on-chip memory BA CFR T2 on-chip memory ICAP
14
single hard interconnect
3. data coercion bitstream IO processor single hard interconnect CFR data = control = bitstream = test = … connect a data port to a configuration port decrypt bitstreams CPU CFR de/encrypt accelerator off-chip memory CFR data coercion happens in connection or in IP on-chip memory CFR on-chip memory
15
single hard interconnect
3. data coercion IO processor single hard interconnect CFR data = control = bitstream = test = … connect a data port to a configuration port decrypt bitstreams relocate bitstreams run-time compute / optimise bitstreams JIT, peephole CPU PH CFR de/encrypt accelerator bitstream off-chip memory CFR on-chip memory CFR IP on-chip memory
16
single hard interconnect
3. data coercion IO processor single hard interconnect CFR data = control = bitstream = test = … connect a data port to a configuration port decrypt bitstreams relocate bitstreams run-time compute / optimise bitstreams JIT, peephole data port to test port (NOC as TAM) on-line (structural) testing on-chip test-vector generation CPU PH CFR de/encrypt accelerator bitstream off-chip memory CFR on-chip memory CFR IP on-chip memory
17
overview applications network on chip FPGA key ideas hardwired NOC
unified interconnect data coercion / type casting application: dynamic partial reconfiguration multiple concurrent applications multiplex sub-applications (“hardware tasks”) example conclusions
18
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of multiple concurrent applications independent applications on own virtual platform no communication, no interference “performance virtualisation” activation given by user, environment, etc. T1 T2 T3 app T AC A app D BAC C1 C2 C3 A1 A2 BA time
19
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of multiple concurrent applications parts of single applications (soft IP, “hardware tasks”) multiplex parts of a single application on same resources app T sub-app A or sub-app C A C app D A1 BA A2 C1 C2 C3 time
20
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of multiple concurrent applications parts of single applications (soft IP, “hardware tasks”) multiplex parts of a single application on same resources internal state state app T A C BAC C1 C2 C3 A1 A2 BA app D time
21
dynamic partial reconfiguration: implementation
system manager resource management (CFR, NOC, memory, …) inter-application virtual platforms T application manager A C BAC application manager system manager time
22
dynamic partial reconfiguration: implementation
system manager resource management (CFR, NOC, memory, …) inter-application virtual platforms intra-application phases NOC programming soft IP / (sub)-application configuration (incl. clock, reset) bottleneck? A C BAC application manager system manager time
23
dynamic partial reconfiguration: implementation
system manager application manager application programming T application manager A C BAC application manager system manager time
24
dynamic partial reconfiguration: implementation
system manager application manager application programming intra-application persistent data management BAC C1 C2 C3 A1 A2 BA state A C BAC application manager system manager time
25
overview applications FPGA network on chip key ideas hardwired NOC
unified interconnect data coercion / type casting application: dynamic partial reconfiguration multiple concurrent applications multiplex sub-applications (“hardware tasks”) example conclusions
26
modelling SystemC bit & cycle accurate NOC model
behavioural CFR models accurate bitstream structure behavioural hard IP models model starting / stopping of applications dynamic, based on user input starting / stopping of sub-applications dynamic, based on flow of data configuration: loading of bitstreams for soft IP; clock & reset programming: of NOC, system & sub-application managers management of persistent state
27
single hard interconnect
example IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
28
single hard interconnect
bitstream programming example data IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration configure: load bitstreams including bitstream syntax, etc. CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
29
single hard interconnect
bitstream programming example data IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration configure: load bitstreams program NOC for (sub)-application A CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
30
single hard interconnect
bitstream programming example data IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration configure: load bitstreams program NOC for (sub)-application A program & start application manager including clocking & reset CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
31
single hard interconnect
bitstream programming example data IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration configure: load bitstreams program NOC for (sub)-application A program & start application manager application manager programs & starts sub-app A soft IP fn is modelled by CFR CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
32
single hard interconnect
bitstream programming example data IO processor single hard interconnect CFR A1 A2 system manager program NOC for configuration configure: load bitstreams program NOC for (sub)-application A program & start application manager application manager programs & starts sub-app A sub-application A runs CPU system manager CFR de/encrypt accelerator off-chip memory BAC CFR application manager on-chip memory BA CFR on-chip memory
33
example system manager program NOC for configuration
configure: load bitstreams program NOC for (sub)-application A program & start application manager application manager programs & starts sub-app A sub-application A runs [Goossens NOCS’08, Wahlah RAW’09]
34
conclusions ideas: hardwired NOC performance:cost
unified interconnects hardware multi-tasking data coercion / type casting cool & useful very detailed model many simplifications & restrictions many open issues design flow: soft IP placement, binding, relocation, etc. [Madsen?] application model: extend use-case model with intra-application dynamism more general notions of persistent state implementation: separation of system & application managers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.