TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

Slides:

Advertisements

Similar presentations

Origins  clear a replacement for DES was needed Key size is too small Key size is too small The variants are just patches The variants are just patches.

Advertisements

“Advanced Encryption Standard” & “Modes of Operation”

VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.

Chap. 5: Advanced Encryption Standard (AES) Jen-Chang Liu, 2005 Adapted from lecture slides by Lawrie Brown.

CS 483 – SD SECTION BY DR. DANIYAL ALGHAZZAWI (3) Information Security.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.

Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.

Cryptography and Network Security Chapter 5

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Cryptography and Network Security

This Lecture: AES Key Expansion Equivalent Inverse Cipher Rijndael performance summary.

AES clear a replacement for DES was needed

Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography.

Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.

Cryptography and Network Security (AES) Dr. Monther Aldwairi New York Institute of Technology- Amman Campus 10/18/2009 INCS 741: Cryptography 10/18/20091Dr.

The Design of Improved Dynamic AES and Hardware Implementation Using FPGA 游精允.

Cryptography and Network Security Chapter 5. Chapter 5 –Advanced Encryption Standard "It seems very simple." "It is very simple. But if you don't know.

Cryptography and Network Security Chapter 5 Fourth Edition by William Stallings.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

Lecture 23 Symmetric Encryption

Dr. Lo’ai Tawalbeh 2007 Chapter 5: Advanced Encryption Standard (AES) Dr. Lo’ai Tawalbeh New York Institute of Technology (NYIT) Jordan’s Campus.

Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

Study of AES Encryption/Decription Optimizations Nathan Windels.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Chapter 5 Advanced Encryption Standard. Origins clear a replacement for DES was needed –have theoretical attacks that can break it –have demonstrated.

Cryptography and Network Security

1 University of Palestine Information Security Principles ITGD 2202 Ms. Eman Alajrami 2 nd Semester

Cryptography and Network Security

Chapter 5 –Advanced Encryption Standard "It seems very simple." "It is very simple. But if you don't know what the key is it's virtually indecipherable."

A Compact and Efficient FPGA Implementation of DES Algorithm Saqib, N.A et al. In:International Conference on Reconfigurable Computing and FPGAs, Sept.

9/17/15UB Fall 2015 CSE565: S. Upadhyaya Lec 6.1 CSE565: Computer Security Lecture 6 Advanced Encryption Standard Shambhu Upadhyaya Computer Science &

Encryption for Mobile Computing By Erik Olson Woojin Yu.

Advance Encryption Standard. Topics  Origin of AES  Basic AES  Inside Algorithm  Final Notes.

Information Security Lab. Dept. of Computer Engineering 122/151 PART I Symmetric Ciphers CHAPTER 5 Advanced Encryption Standard 5.1 Evaluation Criteria.

Automated Design of Custom Architecture Tulika Mitra

LOGO Hardware side of Cryptography Anestis Bechtsoudis Patra 2010.

Chapter 20 Symmetric Encryption and Message Confidentiality.

Swankoski MAPLD 2005 / B103 1 Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn.

Description of a New Variable-Length Key, 64-Bit Block Cipher (BLOWFISH) Bruce Schneier BY Sunitha Thodupunuri.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.

AES: Rijndael 林志信王偉全. Outline Introduction Mathematical background Specification Motivation for design choice Conclusion Discussion.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

Advanced Encryption Standard. Origins NIST issued a new version of DES in 1999 (FIPS PUB 46-3) DES should only be used in legacy systems 3DES will be.

Fifth Edition by William Stallings

Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.

Chapter 2 (C) –Advanced Encryption Standard. Origins clearly a replacement for DES was needed –have theoretical attacks that can break it –have demonstrated.

Advanced Encryption Standard Dr. Shengli Liu Tel: (O) Cryptography and Information Security Lab. Dept. of Computer.

FPGA Implementation of RC6 including key schedule Hunar Qadir Fouad Ramia.

Chapter 2 Symmetric Encryption.

Lecture5 – Introduction to Cryptography 3/ Implementation Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.

1 CPCS425: Information Security (Topic 5) Topic 5  Symmetrical Cryptography  Understand the principles of modern symmetric (conventional) cryptography.

@Yuan Xue Announcement Project Release Team forming Homework 1 will be released next Tuesday.

Zong-Cing Lin 2007/10/31.  Algorithm Description  Why chose Rijndael  Reference.

School of Computer Science and Engineering Pusan National University

CGRA Express: Accelerating Execution using Dynamic Operation Fusion

Implementation of IDEA on a Reconfigurable Computer

Survey of Crypto CoProcessor Design

Dynamically Reconfigurable Architectures: An Overview

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Dynamic High-Performance Multi-Mode Architectures for AES Encryption

International Data Encryption Algorithm

Chip&Core Architecture

Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.

Advanced Encryption Standard

Presentation transcript:

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla

Agenda  Introduction  Survey of Existing Architectures  Xtensa+ Crypto Processor Rijndael Algorithm (AES final selection) RC6, IDEA, and DES  Performance  Trade-off Analysis  Conclusion

Introduction  Commercial Networking Applications require flexible & high throughput secure connectivity  Encryption/Decryption algorithm computation intensive  Multi-session applications present significant load on embedded processors  Embedded systems need performance while optimizing power and area  Our study – existing architectures, analysis of Xtensa as an alternative, performance analysis and trade-offs for embedded

Survey of Existing Architectures  Three categories Specialized Crypto Processors Reconfigurable Architectures Full Hardware Implementation (ASICs/FPGAs)  High Variation in architecture complexity  Performance vs Area tradeoff  Suitability for Embedded Applications

Specialized Crypto Processors  Few VLIW architectures - CryptoManiac  Instruction Combining – Instruction Word combining to exploit ILP  Crypto Arithmetic Unit(s) – multiple XORs, GF multiplication/addition, lookup table substitution, and permutation  Coarse configurability of datapath  Mostly lacking SIMD support  Performance is typically 2x to 6x that of general processors

Reconfigurable Architectures  Numerous reconfigurable processor architectures – PipeRench, MorphoSys, COBRA, and GARP  Functional Units that provide all crypto arithmetic - multiple XORs, GF multiplication/addition, modulo multiplication  Reconfigurable Interconnection Network to provide dynamic change to functional unit connectivity VLIW Instructions Reconfiguration Registers  Suitable for Block Ciphers  High Variability in Performance increase w.r.t Processors

Full Hardware Implementation  High performance implementations targeted to ASICs/FPGAs DES – 12 Gbps on Virtex-E XCV300E AES – 18 Gbps on ASIC using TSMC 0.18  m process  Lacking flexibility and crypto-modes  Memory and Area efficient  Typical latency only in DMA of data to Hardware unit  Need additional processor for control path

Xtensa+ Crypto Architecture  Custom Extensions to Xtensa Processor using the TIE framework  Addition of Generic Key Schedule Register File and Instructions to support all Crypto Algorithms studied  Addition of multiple on-chip SRAMs (in addition to 4 Data-RAMs) to the Xtensa processor  Currently Implemented using Table construct in TIE  Hacked TIE Compiler generated Verilog Code to instantiate multiple RAM models (implemented using multi-dimensional array) for viability analysis  Addition of 4 State Registers and 4 Next State Registers generic to all algorithms studied  Possible future extensions to include multi-session key storage and fast retrieval support

AES Overview  AES (Advanced Encryption Standard) is the standard set to replace DES for both government and private-sector encryption  Uses a fixed block size of 128-bits, with key sizes of 128-, 196-, or 256-bits  Designed to be efficient in both hardware and software across a variety of platforms  10, 12, or 14 rounds depending on key size  128-bit round key used for each round Can be pre-computed and cached for future encryptions

AES Implementation Abstraction  Each round consists of a lookup, byte-level permutation, finite field multiplication, and key XOR  Lookup and multiplication can be combined into four separate 8x32 lookup tables, so each round is 16 lookups and 16 XORs  Decryption is essentially the same, but with different tables and a different key schedule

TIE Implementation  Our implementation does all 16 lookups in parallel, requiring 16 SRAMs  x0, x1, x2, x3, represents the round state (each 32 bits), k0, k1, k2, k3 are the current round key, and Tij are the T-boxes, where i is a duplication index and j is the T-box index  Each round is then: x0 = T00[x0]^T01[x1>>8]^T02[x2>>16]^T03[x3>>24] ^ k0 x1 = T10[x1]^T11[x2>>8]^T12[x3>>16]^T13[x0>>24] ^ k0 x2 = T20[x2]^T21[x3>>8]^T22[x0>>16]^T23[x1>>24] ^ k0 x3 = T30[x3]^T31[x0>>8]^T32[x1>>16]^T33[x2>>24] ^ k0

Other Ciphers Implemented  DES (Data Encryption Standard) 64-bit block, 56-bit key, 16 rounds, Feistel network 8 6x4 S-Boxes, XORs, and bit-level permutations Can’t really be done efficiently in software TIE Implementation required 1 Instruction per round  IDEA (International Data Encryption Algorithm) 64-bit block, 128-bit key, 8 rounds, iterated, operates on 16-bit numbers 4 Multiplications mod , 4 adds mod 2 16, 6 XORS Each round is highly sequential, so difficult to parallelize TIE Implementation required 7 Instructions per round  RC6 Same block and key modes as AES, 20 rounds, iterated Multiplication mod 2 32, XORs, rotations, addition mod 2 32 TIE Implementation required 2 Instructions per round

AES Performance in Xtensa+  Performance of TIE extensions approaches performance of non-pipelined ASICs Total of 31 run-time instructions per data-block  Initial EXOR Instruction  1 Instruction per round computation (10 total)  20 Cycles for Load and Store of 128-bit Data Blocks  Generally an order of magnitude better than pure software  Also faster than reconfigurable hardware or a specialized VLIW processor

Mbps of Throughput BaseVLIWTIEASICReconfig. AES DES IDEA RC

Cycles Per Block BaseVLIWTIEASIC AES DES IDEA RC

Design Tradeoffs  Flexibility Algorithm changes New algorithms New encryption modes Implementation bugs  Time to Market Closer to software development time Can choose which parts to accelerate

Power vs. Performance: Mbps/mW BaseVLIWTIEASICRec. AES DES IDEA RC

Conclusion  Xtensa instructions provide flexibility, performance, and Mbps/mW all somewhere between an ASIC and a VLIW or Software-based solution  Suitable for most Embedded Applications like i, etc.  Using Xtensa for cryptography is a good choice if: You don’t need absolute throughput You don’t need absolute flexibility You need a control processor anyway The algorithms needed are known ahead of time