Presentation is loading. Please wait.

Presentation is loading. Please wait.

Master’s Thesis: Fast Flexible Architectures for Secure Communication

Similar presentations


Presentation on theme: "Master’s Thesis: Fast Flexible Architectures for Secure Communication"— Presentation transcript:

1 Master’s Thesis: Fast Flexible Architectures for Secure Communication
Lisa Wu University of Michigan Advanced Computer Architecture Laboratory Advisor: Professor Todd Austin

2 Advanced Computer Architecture Lab
Project Overview Cipher Kernel Analyses Throughput analysis, bottleneck analysis, relative run time cost, kernel characterization Architectural Extensions CryptoManiac Architecture Instruction architecture, system architecture, processing element architecture, physical design characteristics Super Optimizer Validation and parameter studies Performance Analysis Encryption rate studies 9/18/2018 Advanced Computer Architecture Lab

3 My Research Contribution
Design and implementation of the CryptoManiac co-processor Hardware models of CryptoManiac 8WC, 4WC, 3WC, 2WC, and 4WNC ISA and scheduling of kernels Timing, area, power, and performance analyses of the CryptoManiac co-processor Design and implementation of the super optimizer Instruction combination study Automatic generation of varied width schedules Publication - ISCA 2001 9/18/2018 Advanced Computer Architecture Lab

4 Advanced Computer Architecture Lab
Cryptography Definitions: encryption vs. decryption public-key cipher vs. secret-key cipher Public-secret key ciphers are the most commonly used f(x) g(x) plaintext ciphertext plaintext Public Key Private Key g(x) g(x) plaintext ciphertext plaintext Private Key Private Key 9/18/2018 Advanced Computer Architecture Lab

5 SSL Session Breakdown Focus: Secret-Key Ciphers
client server authenticate public private key https get https recv . private close 9/18/2018 Advanced Computer Architecture Lab

6 Advanced Computer Architecture Lab
Benchmark Suite Cipher Key Size Blk Size Rnds/Blk Author Application 3DES CryptSoft SSL, SSH Blowfish CryptSoft Norton Utilities IDEA Ascom PGP, SSH Mars IBM AES Candidate RC CryptSoft SSL RC RSA Security AES Candidate Rijndael Rijmen AES Standard Twofish Counterpane AES Candidate 9/18/2018 Advanced Computer Architecture Lab

7 Cipher Throughput Analysis
Alpha vs. 4W All except Mars and Twofish were within 10% of the actual machine tests Mars 11%, Twofish 15% Alpha vs. DF Blowfish, IDEA, and RC6 are running within 20% of DF performance Mars 29%, Twofish 76% RC4 and Rijndael are outliers 9/18/2018 Advanced Computer Architecture Lab

8 Cipher Bottleneck Analysis
Alias - impact of stalling loads in the pipeline until all ealier store addresses have been resolved Branch - effects of mispredictions Issue - impact of reducing issue width Mem - impact of introducing a realistic memory system Res - impact of limited functional unit resources Window - impact of a limited-size instruction window 9/18/2018 Advanced Computer Architecture Lab

9 Cipher Relative Run Time Cost Focus: Kernel Loop
3DES and IDEA are small even for 16 byte sessions Mars, RC4, RC6, Rijndael, and Twofish drop well below 10% for 4k+ byte sessions Blowfish is outlier, drops below 10% only for 64k+ byte sessions 9/18/2018 Advanced Computer Architecture Lab

10 Cipher Kernel Characterization
SBOX - substitutions XBOX - permutations IDEA, Mars, RC4, and RC6 rely on arithmetic computations; benefit from more resources (multiplies) and from faster operations (rotates) Blowfish, 3DES, Rijndael and Twofish rely on substitutions; benefit from increased memory bandwidth and accesses 9/18/2018 Advanced Computer Architecture Lab

11 Architectural Extensions
All instructions are limited to two register input operands and one register output ROL and ROR (rotates) for 64 and 32-bit data types ROLX and RORX support a constant rotate of a register input, followed by an XOR with another register input MULMOD computes the modular multiplication of two register values modulo the value 0x10001 SBOX speeds the accessing of substitution tables with 256-entry tables and 32-bit contents SBOXSYNC synchronize the SBOX table with memory XBOX implements a portion of a full 64-bit permutation 9/18/2018 Advanced Computer Architecture Lab

12 SBOX Instruction Semantics
10 8 16 24 63 op 00 SBOX Table Table Index SBOX instruction eliminates address generation All SBOX tables are aligned to a 1k byte boundary Address generation becomes zero-latency bit concatenation Stores to SBOX storage are not visible by later SBOX’s until An SBOXSYNC is executed An alias bit is set 9/18/2018 Advanced Computer Architecture Lab

13 Performance of ISA Extensions
9/18/2018 Advanced Computer Architecture Lab

14 The CryptoManiac Processor
A 4-wide 32-bit VLIW machine with no cache and a simple branch predictor Supports a triadic (three input operands) ISA that permits combining of most cryptographic operation pairs for better clock cycle utilization Can be combined into chip multiprocessor configurations for improved performance on workloads with inter-session and inter-packet parallelism 9/18/2018 Advanced Computer Architecture Lab

15 Advanced Computer Architecture Lab
CryptoManic ISA bundle := <inst><inst><inst><inst> inst := <operation pair><dest><operand 1><operand 2><operand 3> operation pair := <short><tiny>|<tiny><short>|<tiny><tiny>|<long><nop> tiny := <xor> | <and> | <inc> | <signext> | <nop> short := <add> | <sub> | <rot> | <sbox> | <nop> long := <mul> | <mulmod> Examples: Instruction Expression Add-Xor R4, R1, R2, R3 R4 <- (R1+R2)R3 And-Rot R4, R1, R2, R3 R4 <- (R1&&R2)<<<R3 And-Xor R4, R1, R2, R3 R4 <- (R1&&R2)R3 9/18/2018 Advanced Computer Architecture Lab

16 Scheduling Example: Blowfish
BOX ADD XO R Sign Ext L oad SBOX SBOX SBOX SBOX SBOX Add-XOR Load Add XOR XOR-SignExt Takes a total of only 4 cycles to execute! 9/18/2018 Advanced Computer Architecture Lab

17 High-Level Schematic of a Single Functional Unit
Pipelined 32-Bit MUL 1K Byte SBOX Cache Adder Rotator XOR AND Logical Unit 9/18/2018 Advanced Computer Architecture Lab

18 CryptoManiac Architecture
B T I M E m R F U D a t e n Q / O u r f c K y s o X W 9/18/2018 Advanced Computer Architecture Lab

19 CryptoManiac System Architecture
id session action data… result… CM Proc Keystore Req Scheduled In Q Out Q requests . results Request Format Result Format 9/18/2018 Advanced Computer Architecture Lab

20 Timing and Area Results
9/18/2018 Advanced Computer Architecture Lab

21 Encryption Performance
HDTV OC-3 OC-12 9/18/2018 Advanced Computer Architecture Lab

22 Special Case Studies: 3DES and Rijndael
9/18/2018 Advanced Computer Architecture Lab

23 Advanced Computer Architecture Lab
The Super Optimizer Validate hand-scheduled kernel results Automate generation of optimized kernels for the various CryptoManiac architecture studied Instruction combination studies give insight as to possibly eliminate unnecessary hardware S 9/18/2018 Advanced Computer Architecture Lab

24 Instruction Combination Study
9/18/2018 Advanced Computer Architecture Lab

25 Instruction Combining Characteristics
9/18/2018 Advanced Computer Architecture Lab

26 Advanced Computer Architecture Lab
Conclusion Two hardware/software-design techniques to improve the performance of secret-key cipher algorithms Add instruction support for fast substitutions, general permutations, rotates, and modular arithmetic SBOX eliminates address generation Overall speedup of 59% over baseline machine w/ rotates Design an efficient 4-wide VLIW cryptographic co-processor called the CryptoManiac Instruction combining - efficient utilization of clock cycle Rijndael runs 2.25 times faster with 1/100th area and power of a 600MHz Alpha processor 9/18/2018 Advanced Computer Architecture Lab

27 Advanced Computer Architecture Lab
Future Work Access the cost of programmability in the CryptoManiac by comparing design and performance of A dedicated hardware Rijndael implementation (no programmability) A FPGA Rijndael implementation (hardware programmability) CryptoManiac (software programmability). Other application specific processors such as audio processing, speech recognition, and soft-radio. 9/18/2018 Advanced Computer Architecture Lab

28 Advanced Computer Architecture Lab
Acknowledgement Credit for much of the work described in this thesis belongs to my advisor, Professor Todd Austin, for his insight, guidance, and patience. He provided for an excellent research environment, left me enough freedom to do things the way I thought they should be done, and was always available to discuss ideas and problems. I would also like to thank my committee members Professor Steve Reinhardt and Professor Gary Tyson for reviewing this document and serving on the defense committee. Other people that have contributed to the CryptoManiac project include Chris Weaver for hardware design and synthesis support, Jerome Burke and John McDonald for earlier versions of ISA extensions code modifications. 9/18/2018 Advanced Computer Architecture Lab


Download ppt "Master’s Thesis: Fast Flexible Architectures for Secure Communication"

Similar presentations


Ads by Google