Download presentation
Presentation is loading. Please wait.
Published byBrooke Woods Modified over 8 years ago
1
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team
2
2 A paradox Microarchitectures are more and more complex Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.)
3
3 Many hardware features only to improve performance Caches Pipeline Superscalar execution Branch prediction Thread parallelism
4
4 Execution time of a short instruction sequence is a complex function ! Branch Predictor ITLB I-cache Execution core D-cache DTLB L2 Cache Correct mispredict hit miss hit miss hit miss hit miss hit miss
5
5 Execution time of a short instruction sequence is a complex function (2) Depends on the precise state of every microarchitecture component: More than 100 speculative instructions inflight at the same time on a Pentium 4 Instructions are executed out-of-order. Strange correlations almost impredictable at compile time (even in the back-end compiler)
6
6 Understanding AES cache timing attack on high end microprocessor (follows Bernstein2005) AES with lookup tables is a 10 round algorithm with the following “vulnerabilties” The number, the types and the order of the instructions are independent of the key K and the message M to be encrypted. The exact locations of the data word read and written by the first round only depend on K xor M: –The execution time of the first round depends on K xor M (at least statistically) CAN BE EXPLOITED
7
7 Bernstein 2005 (empty cache) Plaintext attack Irrealistic hypothesis: Access to cycle-accurate encryption timing Cache is flushed between two encryptions Not explicit in the paper (but see Lauradoux et al.) Byte by byte determination of the key based on statistically determining the maximum encryption time for each byte of K xor M works only on Pentium 3, not on Pentium 4
8
8 A loaded cache attack (proof of concept codes available) Plaintext attack: Timing of large number of encryptions An irrealistic hypothesis: Access to cycle-accurate encryption timings On a byte basis of K xor M, determine bit subchains statistically leading to the highest encryption time (+ threshold to get confidence) Depending on microarchitectures: –0 to 80 bits of the key recovered by this method depending on the model and stepping of Pentium 4 –Suspect exercising banking in the cache
9
9 First vulnerability For given sequence, Timings are erratic: Unlikely to get exactly the same timing But statistically correlated: cache banking, operation chaining appears in the average
10
10 A possible counter measure for AES Periodically and randomly change the mapping of the look up tables: 9000 cycles for this change: XOR based permutation: See Lauradoux et al HAVEGE can provide the random numbers.
11
11 Indirect timing measures ? Hypothesis: The attacker has access to user mode on the system (legal or illegal) The attacker has no access to your data He/she can run concurently its process with the encryption On conventional systems, no access to microscopic timing of your application: Time slice in 1,000,000s cycles
12
12 Simultaneous Multithreading (SMT): parallel processing on a single processor functional units are underused on superscalar processors SMT: Sharing the functional units on a superscalar processor between several process Advantages: Single process can use all the resources units dynamic sharing of all structures on parallel/multiprocess workloads Second Vulnerability
13
13 Superscalar Issue slots SMT
14
14 Indirect timing measures on a SMT processor (principles) SPY wants to get information on CRYPT 1.SPY and CRYPT runs in parallel 2.SPY tracks a specific event on CRYPT: For instance execution of a branch 3.SPY saturates hardware resources needed for this event by CRYPT for fast execution 4.SPY records its own execution time (reading the hardware clock counter): Irregurality in its own execution time signals the event: CRYPT has try to grab the hardware resource
15
15 Indirect timing measures on a SMT proof of concept (derived from SBPA) The skeleton of a naive RSA core For I =1 to N Sequence X // 1,000s of cycles If Key[I]=1 Sequence Y // 1,000s of cycles Endfor Spy this branch B
16
16 Indirect timing measures on a SMT proof of concept (2) Branch instructions are buffered in a BTB: On Pentium 4, when the branch misses in the BTB, more than 20 cycles penalty SPY: nearly infinite loop iterating on branching over a set of branches occupying the possible entries for B Track irregularities in the timing of the loop: When B is executed, a branch of the SPY is ejected from the BTB, thus creating a timing irregularity: –Iteration is X-type or XY-type Able to reproduce this attack on a toy example
17
17 Indirect timing measures on a SMT Feasible: On a branch on Pentium4 HT, information is leaking: I recovered all the bits of 32 bits key in a single run (on a toy example) Same kind of attack may apply for cache access: memory access sequence could be discovered
18
18 Feasible, but difficult Technically, very difficult: Lack of documentation on the BTB Strange indexing, unknown associativity, BTB hierarchy Requires relatively infrequent events: 1,000s cycles frequency: measure resolution is in the 100s cycles resolution
19
19 So what ? On Pentium 4 HT: If key bits control branches (or addresses of loads): Might be recovered by a spy thread
20
20 Countermeasures Just deactivate Hyperthreading. At present that is a global OS mode (boot time) Rework implementation: Introduce randomness in control path at execution ? Makes attack much more complex
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.