Download presentation
Presentation is loading. Please wait.
Published byIrene Terry Modified over 8 years ago
1
1 Kadlec MAPLD05/P148 Reconfigurable Floating Point Co-Processor for Atmel FPSLIC Jiri Kadlec Institute of Information Theory and Automation (UTIA), Academy of Sciences of the Czech Republic, Prague, CZ Tel : 00420 2 6605 2216 Email: kadlec@utia.cas.cz
2
2 Kadlec MAPLD05/P148 Presentation outline Principle of partial dynamic reconfiguration on Atmel FPSLIC Support infrastructure SW view and SW/HW co-design view Dynamically re-configurable scalable floating-point unit l Where it comes from l Parallel operations: ADD, MUL, FX2FP, FP2FX l Sequential DIV SQRT Case study 1: 32bit pipelined FP multiplier sliced into 2 contexts Case study 2: 24bit FP adder and 24bit FP multiplier as 2 D_macros Lessons learned and thanks
3
3 Kadlec MAPLD05/P148 Principle of partial dynamic reconfiguration on Atmel FPSLIC HW IP 2 HW IP 1 Software Application SW Data 8 Bit RISC MCU X[7:0] Y[7:0] Z[7:0] D[7:0] write 32 bits X Y Z RSA HW IP 1 Internal SRAM Internal ProcessorSmall internal FPGAInternal modification of LUTs
4
4 Kadlec MAPLD05/P148 Partially re-configurable scalable floating-point unit Source code has been derived from the Celoxica floating point. From DK3.1 we use: l RTL simulator l Generic VHDL l C++ bit-exact models can be exported to Matlab/Simulink test benches. VHDL code is recompiled for Atmel. Back end is free Figaro P&R with extensions developed in EU RECONF project Our test-bench: Blue block is bit exact representation of the floating point adder. Identical model supports several widths of Mantissa and Exponent.
5
5 Kadlec MAPLD05/P148 Support infrastructure, SW view for AVR programmer
6
6 Kadlec MAPLD05/P148 SW / HW cores FLASH data FLASH pgm. FLASH bst. PGM overlay API for macros Bit-stream formatter (PC) Guidelines for Macros & Top SW/HW view:
7
7 Kadlec MAPLD05/P148 D_reconfigurable Supermacro with 2 contexts and Dualprted SRAM in the top static design. Target AT40/94 Case study 1: A 32 bit pipelined multiplier sliced manually into 2 smaller dynamically reconfigurable contexts to fit in
8
8 Kadlec MAPLD05/P148 Top-level placement of the I/O 32-bit registers (ra, rb, rz) (left) and the 40bit 32word dual port SRAM (right) Os reflecting the “cut” of the floating point macro Top-level placement of the super-macro. Aligned with the dual port SRAM Nets of the static part with registers ra, rb, rz, and the 40-bit 32word dual port Atmel FREE RAM Sliced 32 bit pipelined multiplier (2)
9
9 Kadlec MAPLD05/P148 Left: Pipelined 32-bit floating-point multiplier. AT94K40. Macro: 1581 Logic Cells. Right: Top with nets for single FP macro. Maximal clock 16,8 MHz. Latency 7. Dense nets result in lower maximal clock frequency. 1722 Logic Cells (75%). Left: Stage1 of sliced FP 32-bit multiplier. AT94K40. Macro: 1083 Logic Cells. Right: Top with nets for Stage1 context. Maximal clock 18,6 MHz. Latency 5. The 40-bit Cut Bus from S1 to S2 is using 32 words of 40-bit DP RAM. 1286 of Logic Cells (55%). Left: Stage 2 of sliced FP 32-bit multiplier. AT94K40. Macro: 512 Logic Cells. Right: Top with Stage2 context. Maximal clock 20,5 MHz. Latency 4. The BE flow is reserving identical subset of cells for both contexts. 1286 of Logic Cells (55%). Sliced 32 bit pipelined multiplier (3)
10
10 Kadlec MAPLD05/P148 Interface of the sliced multiplier super-macro in the FPSLIC testbench. It helps to define one-to one connections from static to dynamic part of the design Sliced 32 bit pipelined multiplier (4)
11
11 Kadlec MAPLD05/P148 Placement of the super-macro in AT94K40. 8bit registers replaced the DP SRAM. Locking of the whole area forced the automatic placement of the rest of the top-level design to be placed to the unlocked “south”. The top design with nets takes 506 Logic Cells (22%). Top with Stage-1 context. (left). Top with Stage-2 context. (right). Sliced 32 bit pipelined multiplier (5) Reconfiguration Time Stag1 -> Stage2 16ms Stage2 -> Stage1 16ms Reconfiguration times for AVR @ 18MHz, Mode 4, reduced sizes (differences of bit- streams) are stored and downloaded from FLASH by the AVR processor
12
12 Kadlec MAPLD05/P148 Case study 2: Reconfigurable 24-Bit Floating Point ADD/MUL Cores 24-bit FP adder Placement Routing 24-bit FP multiplier Placement Routing Reconfiguration time: 50ms for 4 MHz AVR clock Partial bit-stream size: 20k 32-bit configuration words
13
13 Kadlec MAPLD05/P148 Lessons learned and thanks ++ Best for data streaming operations ++ Low cost external FLASH can store tens of HW and SW overlays ++ Partial dynamic reconfiguration can on FPSLIC result in low cost and low power solution. ++ It is valid path for small groups who can not go for an ASIC. -- High complexity needs support in predefined SW/HW infrastructure -- Performance and size of the FPGA part is limited This work has been partially supported by the EU project RECONF http://www.reconf.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.