Cryptanalysis through Cache Address Leakage Eran Tromer Adi Shamir Dag Arne Osvik
2 Cache attacks Pure software No special privileges No interaction with the cryptographic code (some variants) Very efficient (e.g., full AES key extraction from Linux encrypted partition in 65 milliseconds) Compromise otherwise well-secured systems (e.g., see NIST AES process) “Commoditize” side-channel attacks: easily deployed software breaks many common systems
3 CPU core 60% (until recently) Main memory 7-9% Why cache analysis? cache Annual speed increase: Typical latency: ns0.3ns → timing gap
4 Address leakage The cache is a shared resource: cache state affects, and is affected by, all processes, leading to crosstalk between processes. The cached data is subject to memory protection… But the metadata leaks information about memory access patterns: which addresses are being accessed.
5 Related works on cache attacks [Hu ‘92] – Convert channels [Page ’02] – Theoretical attacks via power trace [Tsunoo Tsujihara Minematsu Miyuachi ’02], [Tsunoo Saito Suzaki Shigeri Miyauchi ’03] – Timing attacks via internal collisions [Bernstein ’04] – Model-less timing attacks [Percival ’05] – RSA Different setting, further assumptions, less practical, and/or attack a different ciphers. This the first work to fully demonstrate an efficient cache attack on a modern block cipher in real-life settings.
6 1.Eavesdropping 2.Cryptanalysis 3.Experimental results 4.Implications
7 Associative memory cache DRAM cache memory block (64 bytes) cache line (64 bytes) cache set (4 cache lines)
8 S-box tables in memory DRAM cache S-box table
9 Detecting access to AES tables (basic idea) DRAM cache Attacker memory S-box table
10 Measurement technique Inter-process crosstalk can be exploited in two ways: Effect of the cache on the encryption (timing) Effect of the encryption on the cache
11 DRAM cache T0 Attacker memory 1. Make sure the tables are cached 2. Evict one cache set 3. Time an encryption and see if it’s slow Measuring effect of cache on encryption
12 Measurement technique Inter-process crosstalk can be exploited in two ways: Effect of the cache on the encryption (timing) Effect of the encryption on the cache
13 Measuring effect of encryption on cache DRAM cache Attacker memory 1. Completely evict tables from cache S-box table
14 Measuring effect of encryption on cache DRAM cache Attacker memory 1. Completely evict tables from cache 2. Trigger a single encryption S-box table
15 Measuring effect of encryption on cache DRAM cache Attacker memory 1. Completely evict tables from cache 2. Trigger a single encryption 3. Access attacker memory again and see which cache sets are slow S-box table
16 Advantages of second method Yields more information ( 64) from a single encryption Not a timing attack! Insensitive to timing variance in enryption code path (crucial for effective attacks on complicated systems). No real need to trigger the encryption – can wait until it happens by itself…
17 1.Eavesdropping 2.Cryptanalysis 3.Experimental results 4.Implications
18 char p[16], k[16]; // plaintext and key int32 T0[256],T1[256],T2[256],T3[256]; // lookup tables int32 Col[4]; // intermediate state... /* Round 1 */ Col[0] T0[p[ 0] © k[ 0]] T1[p[ 5] © k[ 5]] T2[p[10] © k[10]] T3[p[15] © k[15]]; Col[1] T0[p[ 4] © k[ 4]] T1[p[ 9] © k[ 9]] T2[p[14] © k[14]] T3[p[ 3] © k[ 3]]; Col[2] T0[p[ 8] © k[ 8]] T1[p[13] © k[13]] T2[p[ 2] © k[ 2]] T3[p[ 7] © k[ 7]]; Col[3] T0[p[12] © k[12]] T1[p[ 1] © k[ 1]] T2[p[ 6] © k[ 6]] T3[p[11] © k[11]]; A typical software implementation of AES lookup index = plaintext key
19 Variant 1: Synchronous attack A software service performs AES encryption using a secret key. An attacker process runs on the same CPU. The attacker process can somehow invoke the service on known plaintext. Examples: — Encrypted disk partition + filesystem — IP/Sec, VPN → the attacker can discover the secret key. or decryption, or MAC or ciphertexts or trigger and time memory accesses remotely
20 Synchronous attack on AES: Overview Measure (possibly noisy) cache usage of many encryptions of known plaintexts. Guess the first key byte. For each hypothesis: — For each sampled plaintext, predict which cache line is accessed by “ T0[p[ 0] © k[ 0]] ” Identify the hypothesis which yields maximal correlation between predictions and measurements. Proceed for the rest of the key bytes. Practically, a few hundred samples suffice. Got 64 bits of the key (high nibble of each byte)! Use these partial results to mount attack further AES rounds, exploiting S-box nonlinearity. A few thousand samples for complete key recovery.
21 Synchronous 1R attack (idealized) Suppose we could answer queries of the form: “Does encryption of p under the unknown key k access the memory block of index y in table T l?” Denote this predicate Q(p,l,y) {0,1}. The approach: Obtain many samples of the predicate ”Guess” part of the key Deduce constraints on value of predicate Check if fulfilled in all samples
22 Synchronous 1R attack (cont.) Consider one of the memory accesses in the first round: “ T0[ p [0] ©k [0]] ” Given a candidate value k’[0] and samples of Q(p,l,y): — The useful samples are those that fulfill p [0] ©k’ [0] y — If k’[0] k[0] then for all useful samples: p [0] ©k [0] p [0] ©k’ [0] y so T0[ p [0] ©k [0]] accesses y hence Q(p,l,y)=1 — Otherwise: p [0] ©k [0] p [0] ©k’ [0] y hence Q(p,l,y)=0 (but there are 35 more “random” accesses to T0 …) with probability (1-1/16) 35 Hence, a few hundred random samples suffice to eliminate all bad candidates find high nibble of all key bytes
23 Full key extraction We got half the key (64 bits) Too much for brute force We extracted all possible information from 1 st -round accesses Move to 2 nd round, taking advantage of the non-linearity of the S- box
24 Synchronous 2R attack (idealized) After expansion of the AES key schedule and 1 st round, we see that the 2 nd round does the following lookup: “ T2[S[p[0] ©k [0]] 2 S[p[5] ©k [5]] 3 S[p[10] ©k [10]] S[p[15] ©k [15]] S[k[15]] k[2]] ” The only relevant unknowns are the low nibbles of k [0], k [5], k [10], k [15] (2 16 candidates). Can test a candidate as before: — Predict this lookup according to guess — Identify useful samples, i.e., those where y is in the same memory block as the prediction — Check whether Q(p,l,y)=1 for all useful samples. There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles. Hence: full key recovery using ~2000 random samples.
25 Scenario 2: asynchronous attack Someone runs encryptions computations using a secret key. An attacker process runs on the same CPU at (roughly) the same time. The plaintext/ciphertext has a non-uniform (conditional) distribution: — English — Formatted data — Headers — Ciphertext gleaned from wire Examples: just about any use of crypto on a multi-user system → attacker can (partially?) discover the secret key
26 Asynchronous attack (basic idea) Compare two distributions: Measured memory accesses statistics. Predicted memory accesses statistics, under the given plaintext distribution and the key hypothesis. Find key that yields best correlation.
27 “Hyper Attack” Obtaining parallelism: — HyperThreading (simultaneous multithreading) — Multi-core, shared caches, cache coherence — Interrupts Attack model: — Encryption process is not communicating with anyone (no I/O, no IPC). — No special measurement equipment — No knowledge of either plaintext of ciphertext
28 1.Eavesdropping 2.Cryptanalysis 3.Experimental results 4.Implications
29 Experimental results Synchronous attack on OpenSLL AES encryption library call: Full key recovery by analyzing 300 encryptions (13ms) Synchronous attack on an AES encrypted filesystem (Linux dm-crypt): Full key recovery by analyzing 800 write operations (65ms) Asynchronous attack on AES (independent process doing batch encryption of text): Recovery of 45.7 key bits in one minute.
30 Experimental example: synchronous attack I Measuring a “black box” OpenSSL encryption on Athlon 64, using 10,000 samples. Horizontal axis: evicted cache set Vertical axis: p[0] (left), p[5] (right) Brightness: encryption time (normalized) k[0]=0x00k[5]=0x50
31 Experimental example: synchronous attack II Measuring a Linux dm-crypt encrypted filesystem with ECB AES on Athlon 64, using 30,000 samples. Horizontal axis: evicted cache set Vertical axis: p[0] Brightness: cache probing time (normalized) Left: raw. Right: after subtracting cache set average.
32 Finding the tables Measuring a Linux dm-crypt encrypted filesystem with ECB AES on Athlon 64, using 30,000 samples. Horizontal axis: candidate table offset Candidate key byte value: k’ [0] (left), k’ [5] (right) Brightness: score k[0]=0x00k[5]=0x50
33 Experimental example: asynchronous attack
34 1.Eavesdropping 2.Cryptanalysis 3.Experimental results 4.Implications
35 Implications and Extensions Multiuser systems Virtual machines — Examples: jail(), Xen, UML, VMware, Virtual PC — Breaks NSA US Patent 6,922,774 — No guarantees in AMD ”Secure Virtual Machine” or Intel “Virtualization Technology” (VMX) specs either DRM The trusted path is leaky (even if verified by TPM attestation, NGSCB, etc.). Untrusted code (e.g., ActiveX, Java applets, managed.NET, JavaScript) Remote network attacks
36 Countermeasures Note: With sufficient temporal resolution, the attacker gets a fair approximation of the full memory access transcript. Secure Efficient Generic ? [GO95] no cache [ZZLP04] [MR04] bitslice OS mode ignore [Intel05]
37 Conclusions New side channel attacks that are cheap easy to exploit hard to detect expensive to avoid applicable to many deployed systems