Gaj1MAPLD 2005/1016 Development and Maintenance of User Libraries for SRC Reconfigurable Computers Kris Gaj 1, Tarek El-Ghazawi 2, Paul Gage 3, Dan Poznanovic.

Slides:



Advertisements
Similar presentations
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
CryptoBlaze: 8-Bit Security Microcontroller. Quick Start Training Agenda What is CryptoBlaze? KryptoKit GF(2 m ) Multiplier Customize CryptoBlaze Attacks.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Configurable System-on-Chip: Xilinx EDK
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Digital signature using MD5 algorithm Hardware Acceleration
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
W.Skulski Phobos Workshop April /2003 Firmware & software development Digital Pulse Processor DDC-8 (Universal Trigger Module) Wojtek Skulski University.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Experimental Performance Evaluation For Reconfigurable Computer Systems: The GRAM Benchmarks Chitalwala. E., El-Ghazawi. T., Gaj. K., The George Washington.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 2: Managing Hardware Devices.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL High Level Language (HLL) Design Flow Reconfigurable Supercomputers ECE 448 Lecture 21.
Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.
Allen Michalski CSE Department – Reconfigurable Computing Lab University of South Carolina Microprocessors with FPGAs: Implementation and Workload Partitioning.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
Parallel Computing Using FPGA ( Field Programmable Gate Arrays ) 15 th May, 2009 Studies in Parallel & Distributed Systems – Sohaib Ahmed.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
200/MAPLD 2004 Craven1 Super-Sized Multiplies: How Do FPGAs Fare in Extended Digit Multipliers? Stephen Craven Cameron Patterson Peter Athanas Configurable.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1,
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Implementation of Image Processing Kernels on SRC and SGI Reconfigurable Computers Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1, and Kris Gaj 2.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Computer Architecture And Organization UNIT-II General System Architecture.
South Carolina The DARPA Data Transposition Benchmark on a Reconfigurable Computer Sreesa Akella, Duncan A. Buell, Luis E. Cordova, and Jeff Hammes Department.
EE3A1 Computer Hardware and Digital Design
An FX software correlator for VLBI Adam Deller Swinburne University Australia Telescope National Facility (ATNF)
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
FPGA-based Supercomputers
Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer Tarek El-Ghazawi1, Esam El-Araby1, Abhishek Agarwal1, Jacqueline.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
J. Harkins1 of 51MAPLD2005/C178 Sorting on the SRC 6 Reconfigurable Computer John Harkins, Tarek El-Ghazawi, Esam El-Araby, Miaoqing Huang The George Washington.
An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris.
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University.
Survey of Reconfigurable Logic Technologies
بسم الله الرحمن الرحيم MEMORY AND I/O.
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, Tarek El-Ghazawi 1 1 The George.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Copyright  2004 SRC Computers, Inc. ALL RIGHTS RESERVED Application Development on the SRC Computers, Inc. Systems Jeff Hammes
Corflow Online Tutorial Eric Chung
Vector computers.
Introduction to the FPGA and Labs
Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer:
Introduction to cosynthesis Rabi Mahapatra CSCE617
Implementation of IDEA on a Reconfigurable Computer
Reconfigurable Computing
RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS
Cluster Computers.
Presentation transcript:

Gaj1MAPLD 2005/1016 Development and Maintenance of User Libraries for SRC Reconfigurable Computers Kris Gaj 1, Tarek El-Ghazawi 2, Paul Gage 3, Dan Poznanovic 3, Chang Shu 1, Deapesh Misra 1, Miaoqing Huang 2, Esam El-Araby 2, Mohamed Taher 2 1 George Mason University 2 The George Washington University 3 SRC Computers, Inc.

Gaj2MAPLD 2005/1016 Reconfigurable Computers

Gaj3MAPLD 2005/1016 Interface  P memory  P memory... PP PP I/O Interface FPGA memory FPGA memory... FPGA... I/O Microprocessor systemFPGA system What is a reconfigurable computer?

Gaj4MAPLD 2005/1016 Examples of High-End Reconfigurable Computers SRC-6E and SRC High-Bar Based Systems from SRC Computers, Inc. Cray XD1 (formerly Octiga Bay 12 K) from Cray Inc. SGI Altix 3000 from Silicon Graphics Star Bridge Hypercomputer from Star Bridge Systems

Gaj5MAPLD 2005/1016 SRC MAP™ Reconfigurable Processor Source: [SRC, MAPLD04]

Gaj6MAPLD 2005/1016 SRC-6E Hardware Architecture

Gaj7MAPLD 2005/1016 Storage Area Network Local Area Network Wide Area Network Disk Customers’ Existing Networks Hi-Bar sustains 1.4 GB/s per port with 180 ns latency per tier Up to 256 input and 256 output ports Common Memory (CM) has controller with DMA capability Up to 8 GB DDR SDRAM supported per CM node PCI-X PCI-X SRC Hi-Bar Based Systems MAP ® SRC-6 MAP PPPP Memory SNAP™ PPPP Memory SNAP Gig Ethernet etc. Common Memory ChainingGPIO SRC Hi-Bar Switch Source: [SRC, MAPLD04]

Gaj8MAPLD 2005/1016 SRC Programming HLL (C) HDL (VHDL) SRC  P system FPGA system Application Programmer Library Developer

Gaj9MAPLD 2005/1016 C function for  P C function for FPGAs VHDL macro for FPGAs SRC Program Partitioning  P system FPGA system HLL HDL

Gaj10MAPLD 2005/1016 Main program Function_1(a, d, e) Function_2(d, e, f) Function_1 Function_2 Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) FPGA …… Macro_1 Macro_2 a b c de FPGA contents after the Function_1 call Program in C or Fortran Run Time Reconfiguration in SRC

Gaj11MAPLD 2005/1016 SRC Development Environment Object files Application sources MAP Compiler  PCompiler Logic synthesis Place & Route Linker.bin files.edffiles.o files Application executable Configuration bitstreams HDL sources.c or.f files.vhdor.v files Object files Application sources User Macro Sources MAP Compiler  PCompiler Logic synthesis Place & Route Linker.edf files.bin files.files.o files Application executable Configuration bitstreams HDL.c or.f files.vhdor.v files

Gaj12MAPLD 2005/1016 Advantages of reconfigurable computers can be programmed by mathematicians themselves using traditional programming languages or GUI environments encourage innovation and experimentation general-purpose: cost distributed among multiple users with different needs behave like hardware: - parallel processing - distributed memory - specialized functional units, etc.

Gaj13MAPLD 2005/1016 Conditions necessary for the success of reconfigurable computers ease of use of library macros and functions existence of comprehensive libraries of user macros and functions capable of running on FPGAs significant speed-ups (  100 x) of basic functions running on FPGAs compared to state-of-the-art microprocessors

Gaj14MAPLD 2005/1016 Development and Maintenance of SRC Libraries

Gaj15MAPLD 2005/1016 Structure of the macro repository common rev_drev_e hdlfile InfoFileBlkBoxFile macro1 macro2macro3 rev_f DebugCodeFile DataSheet

Gaj16MAPLD 2005/1016 common: These are macros that have no connections to external pins nor to any specific FPGA type specific feature. This type of macro can be used on any MAP rev_d: These macros have a specific dependency on the dual MAP rev_e: These macros have a specific dependency on the single MAP rev_f: These macros have a specific dependency on compact MAP Macro Types

Gaj17MAPLD 2005/1016 Files describing the macro Platform independent HDL file: macro.v or macro.vh Verilog or VHDL code defining the macro Debug Code File: macro.c provides the equivalent C functionality for the macro Platform dependent Blk Box File: blackbox.v Interface (black box) definition for the macro in Verilog Data sheet file: datasheet contains the documentation for the macro Info File: info Info file entry for the given macro, containing macro type, latency, names of input/output/control signals, etc.

Gaj18MAPLD 2005/1016 To properly manage a distribution of macros a CVS repository must be setup. This allows the source code changes to be controlled and permits multiple developers to work on the code. CVS repository

Gaj19MAPLD 2005/1016 The Installed Macro Library Structure map 3 (built for the Xilinx Virtex2)map 4 (built for the Xilinx Virtex2Pro) common rev_drev_e ngo blkbox.vmacros.info macro1 macro2macro common rev_drev_e Single info file Single blackbox file Obtained by running a special script developed by SRC

Gaj20MAPLD 2005/1016 Library Script Usage: build_libs [OPTION] [-b, --branch br]Specify CVS branch [-c, --checkout]Checkout only [-d, --CVSROOT cvsroot]Specify CVSROOT [-M, --MAP maptype]Build for MAP maptype [-m, --module mod]Build mod only [-r, --restart mmddyy-hhmm]Restart previous build [-s, --step target]Run build step target [-v, --version N.n]Package as version N.n [-V, --vendor vend]Specify distribution vendor [-w, --workspace path]Create workspace in path

Gaj21MAPLD 2005/1016 Building libraries build_libs will checkout library and perform a build in /var/tmp/builds in a folder with a time stamp (i.e ) If there is an error check file called ‘output’ in the /var/tmp/builds. Fix the error and restart build by: build_libs --restart You can also do a partial build, say only build the library and not the CD build_libs --step lib To build only a particular subset of a library, you can do so using a command such as: build_libs --module crypto

Gaj22MAPLD 2005/1016 Structure for the repository of MAP C functions common rev_drev_e routine1 routine2routine3 rev_f

Gaj23MAPLD 2005/1016 Source file: This is the.mc or.mf file defining the MAP routine proto.h: This file provides a prototype of the MAP routine Makefile: This is a standard Carte Makefile, with the exception that no BIN environment variable is provided. Docfile: This file provide a man page format documentation of the MAP routine. Files describing the MAP C routine

Gaj24MAPLD 2005/1016 The Installed MAP Routine Library Structure map 3map 4 common rev_drev_e lib1.a lib1.so lib2.a common rev_drev_e lib2.so......

Gaj25MAPLD 2005/1016 Known problems: No support for variable size of operands

Gaj26MAPLD 2005/1016 We would like to be able to create and maintain a library of generic components that work for various operand sizes. Problem statement Example: Basic arithmetic operations (addition, subtraction, multiplication, division) of multiprecision (n-bit) integers.

Gaj27MAPLD 2005/1016 Possible solutions 1. Fixed-size interface to a macro using streams without using streams 2. Variable-size interface to a macro cell

Gaj28MAPLD 2005/1016 Input (64-bits) Output (64-bits) Process

Gaj29MAPLD 2005/1016 Passing variable-size operands without streams for (i=0; i<3*N+1; i++) { if (i < N) A_in = c[i]; B_in = d[i]; else A_in = 0; B_in = 0; mul (i, A_in, B_in, &C_out); if (i > N) e[i-N] = C_out; }

Gaj30MAPLD 2005/1016 Passing variable size operands using streams #pragma src section { for (i=0; i<N; i++) { put_stream (&S0, A[i], 1); // put A[i] to S0 put_stream (&S1, B[i], 1); // put B[i] to S1 } #pragma src section { mul (&S0, &S1, &S2); // read from S0 and S1, write to S2 } #pragma src section { for (i=0; i<2*N; i++) get_stream (&S2, &C[i]); // take from S2 and write to C[i] }

Gaj31MAPLD 2005/1016 Process

Gaj32MAPLD 2005/1016 Multiprecision Integer Library Generator Multiprecision Integer Library Generator (C engine) C/VHDL Wrapper Black Box Info file Size of operands - N In-line MAP C function

Gaj33MAPLD 2005/1016 Inline MAP C function for N=2 int mul (int64_t *A, int64_t *B, int64_t *C, N) { int64_t A0, A1; int64_t B0, B1; int64_t C0, C1, C2, C3; A0=A[0]; A1=A[1]; B0=B[0]; B1=B[1]; Mul_128(A0, A1, B0, B1, &C0, &C1, &C2, &C3); C[0] = C0; C[1] = C1; C[2] = C2; C[3] = C3; }

Gaj34MAPLD 2005/1016 Pros and cons of both methods 1. Fixed-size interface to a macro Pros: Interface independent of the operand size Cons: input/output overhead 2. Variable-size interface to a macro cell Pros: minimum overhead Cons: need to generate automatically several macro files, need for changes in the compiler

Gaj35MAPLD 2005/1016 GMU/GWU Libraries

Gaj36MAPLD 2005/1016 Cryptographic Libraries Secret Key Ciphers Secret key ciphers encryption and breaking – SecCiph Public Key Ciphers Elliptic Curve Cryptosystems arithmetic - ECC Binary Galois Field GF(2 m ) arithmetic in Polynomial Basis - GF2n_PB Binary Galois Field GF(2 m ) arithmetic in Normal Basis - GF2n_NB Multiprecision integer arithmetic (in collaboration with University of South Carolina) – Long_Int Operations supporting factorization of large integers using Number Field Sieve - NFS

Gaj37MAPLD 2005/1016 Digital Image Processing Libraries Image Enhancement / Restoration Single-Resolution  Noise Reduction (Convolution Filtering) Smoothing (Lowpass) Gaussian (Lowpass) Blurring (Lowpass) Sharpening (Highpass)  Edge Detection (Derivative Filters) Prewitt Sobel Multi-Resolution  Discrete Wavelet Transform (DWT)  Inverse Discrete Wavelet Transform (IDWT) Similarity Measures Correlation

Gaj38MAPLD 2005/1016 Miscellaneous Libraries Sorting Stream-searching BMM - Bit Matrix Multiply DARPA benchmarks

Gaj39MAPLD 2005/1016 Performance of selected applications based on GMU/GWU libraries

Gaj40MAPLD 2005/ input/output intensive applications bulk data encryption (DES, IDEA, and RC5 encryption) image processing (Sobel Edge Detection, Median Filter, Wavelet Hyperspectral Dimension Reduction) 2. computationally intensive applications secret-key cipher breaking based on the exhaustive key search (DES, IDEA, RC5 breakers) public-key cipher breaking based on factoring 3. latency-critical applications cipher key agreement and signature (ECC schemes, RSA) Classes of applications

Gaj41MAPLD 2005/1016 PC based on Pentium IV, 2.4 GHz clock, 512 MB of RAM, 512 KB of cache Reference Platform Treated as a basic building block of a cluster of microprocessor boards. Platform used in experiments SRC-6E from SRC Computers, Inc.

Gaj42MAPLD 2005/1016 Timing Measurements MAP Alloc. MAP Free DMA DataOut DMA Data In FPGA Computation.c file.mc file End-to-End time (SW) MAP function MAP function FPGA Configure Configuration time MAP Allocation time MAP Release Time End-to-End time (HW) MAP – SRC Reconfigurable Processor based on two User FPGAs

Gaj43MAPLD 2005/1016 Application Computational Throughput (Mbits/s) Data Transfer In Throughput (Mbits/s) Data Transfer Out Throughput (Mbits/s) End-to-End Throughput (Mbits/s) Speed up SRC 6E Pentium IV DES Encryption 6,3982,4881, IDEA Encryption 12,7882,4871, RC5 Encryption 6,3982,5051, Sobel Edge Detection 5,6802,4931, Median Filter 5,6812,4841, Wavelet Hyperspectral Dimension Reduction 63952,5731, – 159 (5 levels – 1 level) 5 – 12 (1 level – 5 levels) Input/Output Intensive Applications P3 version of SRC-6E

Gaj44MAPLD 2005/1016 Wavelet Hyperspectral Dimension Reduction Time contributions P3 version of SRC-6E vs. Pentium IV PC

Gaj45MAPLD 2005/1016 Application Computatinal Throughput (Mbits/s) Data Transfer In Throughput (Mbits/s) Data Transfer Out Throughput (Mbits/s) End-to-End Throughput (Mbits/s) Speed up SRC 6E Pentium IV IDEA Encryption 12,79010,62710,5833, RC5 Encryption , Sobel Edge Detection 5,6836,3846,3802, Median Filter 5,6846,3846,3832, Wavelet Hyperspectral Dimension Reduction 6,3946,3946,3493,1851, – 159 (5 levels – 1 level) 10 – 24 (1 level – 5 levels) Input/Output Intensive Applications P4 version of SRC-6E

Gaj46MAPLD 2005/1016 Wavelet Hyperspectral Dimension Reduction Time contributions P4 version of SRC-6E vs. Pentium IV PC

Gaj47MAPLD 2005/1016 Application Computational Throughput (Mbits/s) Data Transfer In Throughput (Mbits/s) Data Transfer Out Throughput (Mbits/s) End-to-End Throughput (Mbits/s) Speed up SRC 6E Pentium IV IDEA Encryption (no overlapping) 12,79010,62710,5833, IDEA Encryption (with overlapping) 10,8579,79210,5644, RC5 Encryption (no overlapping) , RC5 Encryption (with overlapping) 63986,3726,3493, Input/Output Intensive Applications P4 version of SRC-6E without and with overlapping computations and data transfers

Gaj48MAPLD 2005/1016 Application Computational Throughput (Mbits/s) Data Transfer In Throughput (Mbits/s) Data Transfer Out Throughput (Mbits/s) End-to-End Throughput (Mbits/s) Speed up SRC 6 Pentium IV DES Encryption (no overlapping) 19,20011,35010,7604, IDEA Encryption (no overlapping) 19,20011,35010,7604, RC5 Encryption (no overlapping) 19,20011,35010,7604, Input/Output Intensive Applications SRC Hi-Bar Based System

Gaj49MAPLD 2005/1016 Application Computational Throughput Data Transfer In Throughput Data Transfer Out Throughput End-to-End Throughput (mln keys/s) Speed up SRC 6E Pentium IV DES Breaker 800N/A IDEA Breaker 1000N/A RC5 Breaker 100N/A Computationally Intensive Applications P3 version of SRC-6E

Gaj50MAPLD 2005/1016 Latency-Critical Applications Application Computatinal Latency Data Transfer In Latency Data Transfer Out Latency End-to-End Latency (μs)(μs) (μs)(μs) (μs)(μs)(μs)(μs) Speed up SRC 6E Pentium IV ECC DH Key Agreement over GF(2 233 ), Optimal Normal Basis , ECC DH Key Agreement over GF(2 233 ), Polynomial Basis ,05033

Gaj51MAPLD 2005/1016 RSA: SRC vs. OpenSSL Software Comparison Data Size SW Function Time (ms) SW Speedup vs. MAP SW x x x x x

Gaj52MAPLD 2005/1016 Sparse matrix by vector multiplication Reference Optimized SW Implementation: PC, Pentium IV, GHz, 1 GB RAM

Gaj53MAPLD 2005/1016 Summary & Conclusions

Gaj54MAPLD 2005/1016 Summary Type of application End-to-end speed-up of SRC vs. P4 Computationally intensive (cipher breaking) Latency critical RSA ECC polynomial bases, general fields 33 ECC polynomial bases, special fields ECC optimal normal bases 600 Input/output intensive 3-30 (secret key encryption/decryption)

Gaj55MAPLD 2005/1016 Summary & conclusions (1) General methodology for the design and maintenance of SRC user libraries developed and tested Existing libraries evaluated in terms of - performance - ease of use - flexibility for three wide classes of applications Initial results very encouraging

Gaj56MAPLD 2005/1016 Selected files from the SRC libraries can be used for development of comparable libraries for other reconfigurable computers Full compatibility with other reconfigurable computers difficult to achieve because of the technical differences and intellectual property constraints Summary & conclusions (2)