Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.

Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington University, 2 SRC Computers Inc., 3 George Mason University http://cpe02.gmu.edu/rcm/

Features of General-Purpose Reconfigurable Computers composed of traditional microprocessors and Field Programmable Gate Arrays (FPGAs) closely integrated with each other programming does not require knowledge of hardware design permit run-time reconfiguration of FPGAs

Hardware Architecture and Programming Model of SRC-6E

SRC Hardware Architecture 2 Intel® microprocessors SNAP MAP processor 2 Intel® microprocessors SNAP Chain ports 800 MB/s MAP processor MAP module

SRC Hardware Architecture – cont.

Main program Function_1(a, d, e) Function_2(d, e, f) Function_1 Function_2 Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) FPGA …… Macro_1 Macro_2 a b c de FPGA contents after the Function_1 call Program in C or Fortran SRC Programming Model

Object files Application sources Macro sources MAP Compiler  P Compiler Logic synthesis Place & Route Linker.v files.bin files.ngo files.o files Application executable Configuration bitstreams HDL sources Netlists.c or.f files.vhd or.v files Compilation Process of SRC-6E Synplicity Xilinx Intel

High-throughput Triple DES encryption Application Case Study 1

High-throughput encryption 3 DES MiMi M i+1 M i+2 CiCi C i+1 C i+2.. K0K0

Fully pipelined architecture of Triple DES.. 1 2 17 … 18 19 34 … 35 36 51 … DES macro 51 pipeline stages New input & new output every clock cycle

Overhead of the data transfer L2 MIOC PCISlot SNAPSNAPSNAPSNAP Private Memory  P Board Xeon  P L2 PCISlot MIOC Private Memory SNAPSNAPSNAPSNAP L2  P Board Control Chip On-BoardMemory (24 MB) (6x) UserChip UserChip Control Chip On-BoardMemory (24 MB) UserChip UserChip Xeon  P (6x) (6x) (6x) Xeon  P Xeon  P MAP Board

Timing Measurements 1.end-to-end execution time: (wall clock time - HLL Level) includes the configuration, data transfer and data processing times 2.w/o configuration time: (wall clock time - HLL Level) excludes the configuration time but includes data transfer and data processing times 3.MAP Time: (clock counter - Hardware Level) only includes data processing time Three-level timing measurement scheme has been employed:

Triple DES Encryption 0 20 40 60 80 100 120 140 160 1024 10,000 25,000 50,000 100,000 250,000 500,000 configuration data transfer computation Execution time [ms] Number of encrypted blocks

execution time dominated by - configuration of the MAP FPGA and - data transfer between the System Common Memory and On-Board-Memory Problems configuration time hiding techniques  preloading the configuration before execution  flip-flopping FPGAs during reconfiguration

Data transfer hiding techniques Data transfer can be hidden by overlapping DMA time with the data processing time Input DMA Encryption Output DMA Input DMA Encry- ption Output DMA Input DMA Encry- ption Possible speed-up up to 33% Output DMA

Reference software implementations Platform: Software: Pentium 4, 1.8 GHz, 512 kB cache, 1 GB RAM Non-optimized: Optimized for encryption (but not for cipher breaking): Public domain code C only Intel C++ -O3 optimization Phil Karn’s DES code C and assembly language with look-up table precomputations GNU gcc v. 2.96 -O4 optimization

Optimized P4 code Non-optimized P4 code Total execution time of Triple DES for Pentium 4 using optimized and non-optimized code  4

Throughput results for SRC-6E and Pentium 4

SRC-6E vs. Pentium 4 speed-up

DES cipher breaking Application Case Study 2

Secret-key breaking DES M0M0 C0C0 … K1K1 K2K2 K3K3 KNKN Generated by the DES breaker

Keys generated in the User FPGA L2 MIOC PCISlot SNAPSNAPSNAPSNAP Private Memory  P Board Xeon  P L2 PCISlot MIOC Private Memory SNAPSNAPSNAPSNAP L2  P Board Control Chip On-BoardMemory (24 MB) (6x) UserChip UserChip Control Chip On-BoardMemory (24 MB) UserChip UserChip Xeon  P (6x) (6x) (6x) Xeon  P Xeon  P MAP Board

0 200 400 600 800 1,000 1,200 128,000 1,000,000 100,000,000 Number of tested keys Execution time [ms] DES breaking machine configuration data transfer computation

SRC-6e vs. Pentium 4 Speed-up

Conclusions Two different classes of applications developed and tested for SRC-6E and Pentium 4 PC - Triple DES encryption: real-time data streaming - DES breaking: minimal input/output

Wall-clock speed-ups 3 DES Encryption Speed-ups without reconfiguration Conclusions – cont. DES Breaking 3.4 vs. P4 C code 12.5 vs. P4 assembly code 894 vs. P4 C code (larger for real-time input sizes) 11 vs. P4 C code 41 vs. P4 assembly code 1583 vs. P4 C code 3 DES EncryptionDES Breaking

Informal speed/cost comparison Cost of the SRC machine Cost of PC  100 Speed of the SRC machine Speed of PC  1600 * * with only one out of four FPGAs used in computations 16 x improved speed/cost ratio

Conclusions: Overheads Reconfiguration time Data transfer time Most affected applications: Minimization techniques: Most affected applications: Minimization techniques: short execution time, large resource requirements, frequent reconfiguration high speed real-time input/output overlapping data transfer with computations preloading configuration flip-flopping among multiple FPGAs

Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.

Similar presentations

Presentation on theme: "Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.

Similar presentations

Presentation on theme: "Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington."— Presentation transcript:

Similar presentations

About project

Feedback