Download presentation
Presentation is loading. Please wait.
Published byYrjö Laakso Modified over 6 years ago
1
Dynamic High-Performance Multi-Mode Architectures for AES Encryption
Eric Swankoski Naval Research Lab Vijay Narayanan Penn State University Swankoski MAPLD 2005 / B103
2
Background & Motivation
Bandwidth and throughput capabilities of modern optical networks is skyrocketing Protecting transmitted data becoming more and more critical Current encryption architectures generally aren’t capable of keeping up with high-speed environments SEU effects rarely, if ever, considered Swankoski MAPLD 2005 / B103
3
Plan of Attack: FPGA Encryption
Algorithm: Advanced Encryption Standard (AES) Supports multiple key lengths Supports multiple encryption modes Supports multiple levels of pipelining Target Architecture: Xilinx FPGAs Can be adapted to ASIC devices Virtex-II, Virtex-4 Target Performance: 60+ gigabits per second Requires both inner-round and outer-round pipelining Swankoski MAPLD 2005 / B103
4
The AES Algorithm 10 Rounds of Encryption for 128-bit operands
Four basic operations: SubBytes: 8-bit substitution (16 parallel operations per round) ShiftRows: Byte reordering and rotation (4 parallel operations per round) MixColumns: Polynomial multiplication (4 parallel operations per round) AddRoundKey Simple 128-bit XOR Swankoski MAPLD 2005 / B103
5
Optimizing for Performance
Exploit all possible parallelism Alternative byte substitution methods 1 cycle for a lookup-based substitution 5 cycles for a mathematical transformation Utilize pipelining Outer-Round: 1 cycle per round Inner-Round: 4 cycles per round (lookup-based byte substitution) 8 cycles per round (pipelined byte substitution) Swankoski MAPLD 2005 / B103
6
Combinatorial Byte Substitution
Actual mathematical transformation Conventional implementation cannot be pipelined Simple (atomic) 8x8 lookup table Smaller than lookup table Faster than lookup table Utilizes five-stage pipeline All internal operands are four bits wide Swankoski MAPLD 2005 / B103
7
Encryption Round Diagram
Atomic S-Box: 40 Pipeline Stages Combinatorial S-Box: 76 Pipeline Stages Needs a constant stream to be effective Parallel Key Scheduling No performance penalty Offline Key Scheduling Precomputed keys can be stored in registers Swankoski MAPLD 2005 / B103
8
Counter (CTR) Mode Effectively converts AES into a stream cipher
High security – similar to CBC Supports inner-round and outer-round pipelining No error propagation – errors are completely isolated Swankoski MAPLD 2005 / B103
9
Cipher Block Chaining (CBC) Mode
Most secure – no patterns are observed Cannot be pipelined 100% downstream corruption resulting from data loss or single-event upsets (SEUs) during encryption Errors are isolated during decryption Swankoski MAPLD 2005 / B103
10
Electronic Codebook (ECB) Mode
Supports full pipelining No error propagation – errors are completely isolated Least secure – identical input gives identical output Patterns observable in video and image data Swankoski MAPLD 2005 / B103
11
Staggered CBC Mode Pipelined with Output Feedback
Each encrypted block n depends on itself and the block (n – x) where x is the latency of the pipeline Maintains security while mitigating some error propagation problems Swankoski MAPLD 2005 / B103
12
More Challenges Error-Tolerant Encryption Maintaining High Security
Maintaining High Performance Swankoski MAPLD 2005 / B103
13
Error-Tolerant Encryption
Are errors acceptable? Possibly, but better to assume not How do the multiple modes of encryption deal with upsets? Is there a benefit to triple modular redundancy (TMR)? Is it what we expect? Swankoski MAPLD 2005 / B103
14
Error-Tolerant Encryption
CTR and ECB encryption isolate errors Transmission integrity largely preserved even without SEU mitigation TMR can ensure 100% transmission integrity TMR REQUIRED for CBC encryption Swankoski MAPLD 2005 / B103
15
Error-Tolerant Encryption
Image 1: Error-Free Plaintext Image Before Encryption / After Decryption CTR, ECB, or CBC with mitigation Image 2: Decrypted Plaintext Image One corrupted block CTR or ECB without mitigation Image 3: Decrypted Plaintext Image One block corrupted during encryption CBC without mitigation Swankoski MAPLD 2005 / B103
16
Maintaining High Security
How do the multiple modes of encryption affect security? Is physical protection of the key necessary? Depends on the environment How is throughput affected by increased security? Hopefully, not at all… Swankoski MAPLD 2005 / B103
17
Maintaining High Security
ECB-encrypted image has observable patterns CTR/CBC/SCBC encryption looks like random noise Swankoski MAPLD 2005 / B103
18
Maintaining High Security
Physical Key Protection Not required in aerospace applications Power Analysis / Soft Attacks Countermeasures not mode specific Throughput Effects ECB & CTR far outperform CBC Why is CBC an official mode? Swankoski MAPLD 2005 / B103
19
System-Level Diagram Supports ECB, CTR, CBC, and SCBC modes
Supports two types of TMR System: triplicates all control, key hardware, and mode logic Encryption: triplicates only encryption and key scheduling hardware Swankoski MAPLD 2005 / B103
20
Performance Results – Virtex-4
Byte Substitution Key Scheduling Area Frequency Throughput (CTR, ECB, SCBC) (CBC) ROM Online 3588 339.5 MHz 43.5 Gbps 1.088 Gbps Offline 2827 446.8 MHz 57.2 Gbps 1.430 Gbps Combinatorial 13651 519.2 MHz 66.5 Gbps 700.0 Mbps 10912 Key Scheduling Offline uses precomputed and stored keys (compile or design time) Online uses dynamically computed keys (run time) Significant performance improvement for combinatorial byte substitution in pipelined mode Virtex-II Pro performs better with ROM implementation (56.42 & Gbps) Better CBC performance achieved through other architectures Swankoski MAPLD 2005 / B103
21
Lessons Learned Don’t try to over-optimize FPGA code
Returns diminish quickly Sometimes less is more Know your synthesis tool Now why did it do THAT? Check your system’s memory RAM does fail at inopportune times… ESPECIALLY if it has a lifetime warranty Swankoski MAPLD 2005 / B103
22
Lessons Learned Over-optimization
In a highly pipelined FPGA design, routing plays a MAJOR role in the clock frequency 70%-80% of the total delay What would work in an ASIC (or in theory, or on paper…) might actually make things worse Manual floorplanning and P&R might help, but usually provides minimal (if any) improvement Moral? – Try reducing the pipeline depth as well as increasing it, it just might help! Swankoski MAPLD 2005 / B103
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.