Download presentation
Presentation is loading. Please wait.
1
Alireza Hodjat IVGroup
High Throughput AES Alireza Hodjat IVGroup
2
The AES Algorithm Key Addition Substitution Shift Row Mix Column
ki Key Sch_Sub Key Sch_rt Key Sch_xor kn
3
Outer-round Pipelining
4
Inner- and Outer-round Pipelining
5
The Highest Possible Throughput
The choice of 128-bit key only Completely unrolled loop Pipelined Between each round (Outer-round) Inside each round (Inner-round) This causes huge area consumption.
6
Area Optimization Area optimization inside each round
Two different techniques: Resource sharing Re-timing Break the critical path and perform the algorithm in multiple clock cycles Critical path: Substitution Area-delay trade-off
7
Sbox Area-Delay Trade-off
Sbox area-delay trade-off for ASIC Design Type Critical path Area Re-timing Direct No-Pipeline 1.19 ns 2.086 Kgates No Indirect 3.67 ns 1.167 Kgates One stage pipeline 0.78 ns 3.51 Kgates Yes 2 pipe stages Three stage pipeline 1.11 ns 1.65 Kgates 3 pipe stages Sbox area-delay trade-off for FPGA Design Type Critical path Area Re-timing Direct No-Pipeline 4.05 ns 136 LUTs No Indirect 10.41 ns 94 LUTs One stage pipeline 3.91 ns Yes 2 pipe stages Three stage pipeline 5.95 ns 90 LUTs 3 pipe stages No-pipeline Using Block RAM 4.87 ns 0 LUTs Direct Implementation: Look-up table Indirect Implementation: GF(24) Wolkerstorfer Design Patrick’s codes
8
AES Encrypt Datapath 4 3 2 1 S M +
9
Key Scheduling Datapath
4 3 2 1 + S
10
Design 1: Straight Forward
4 3 2 1 S M + 1 Cycle 1 Round
11
Design 2: Use re-timing for Sbox
4 3 2 1 S M + 1 Cycle 1 Round
12
Design 3: Use resource sharing
4 3 2 1 M + S-D S-C S-B S-A 4 Cycle 1 Round
13
Design 4: Use resource sharing and re-timing for Sbox
3 2 1 M + S-A-1 S-A-2 S-C-1 S-C-2 S-B-1 S-B-2 S-D-1 S-D-2 5 Cycle 1 Round 5 Cycle 5 Cycle
14
Design 5: Resource sharing and pipelining and re-timing for Sbox
4 3 2 1 Mix Column + S-A-1 S-A-2 S-C-1 S-C-2 S-B-1 S-B-2 S-D-1 S-D-2 1 Cycle 1 Round
15
Inner-Round Pipeline for Design 5
M S2 A 1 2 3 4 1 2 3 4 1 2 3 4 Round 1 Round 2 … Time
16
Performance Estimation
Design # 1 # 2 # 3 # 4 # 5 Clock per Sample 1 4 5 Pipe stages per round stages stages Total pipe stages 4 10 stages 3 10 stages Latency 4 10 cycles 4 3 10 cycles 5 3 10 cycles (4 10) + 4 cycles FPGA Throughput (200MHz) Gbit/s Gbit/s ASIC Critical path 1.5 ns 650 MHz 1 ns 1 GHz Estimated Area Less than 500 Kgates Less than 900 Kgates Less than 150 Kgates Less than 300 Kgates Less than 250 Kgates ASIC Throughput (128*650) 83.2 Gbit/s (128*1) 128 Gbit/s (128*650/4) 20.8 Gbit/s (128*1/5) 25.6 Gbit/s (128*1/4) 32 Gbit/s
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.