Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.

Similar presentations


Presentation on theme: "1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team."— Presentation transcript:

1 1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team

2 2 A paradox  Microarchitectures are more and more complex  Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.)

3 3 Many hardware features only to improve performance  Caches  Pipeline  Superscalar execution  Branch prediction  Thread parallelism

4 4 Execution time of a short instruction sequence is a complex function ! Branch Predictor ITLB I-cache Execution core D-cache DTLB L2 Cache Correct mispredict hit miss hit miss hit miss hit miss hit miss

5 5 Execution time of a short instruction sequence is a complex function (2)  Depends on the precise state of every microarchitecture component:  More than 100 speculative instructions inflight at the same time on a Pentium 4  Instructions are executed out-of-order.  Strange correlations almost impredictable at compile time (even in the back-end compiler)

6 6 Understanding AES cache timing attack on high end microprocessor (follows Bernstein2005)  AES with lookup tables is a 10 round algorithm with the following “vulnerabilties”  The number, the types and the order of the instructions are independent of the key K and the message M to be encrypted.  The exact locations of the data word read and written by the first round only depend on K xor M: –The execution time of the first round depends on K xor M (at least statistically) CAN BE EXPLOITED

7 7 Bernstein 2005 (empty cache)  Plaintext attack  Irrealistic hypothesis:  Access to cycle-accurate encryption timing  Cache is flushed between two encryptions Not explicit in the paper (but see Lauradoux et al.)  Byte by byte determination of the key based on statistically determining the maximum encryption time for each byte of K xor M  works only on Pentium 3, not on Pentium 4

8 8 A loaded cache attack (proof of concept codes available)  Plaintext attack:  Timing of large number of encryptions  An irrealistic hypothesis:  Access to cycle-accurate encryption timings On a byte basis of K xor M, determine bit subchains statistically leading to the highest encryption time (+ threshold to get confidence) Depending on microarchitectures: –0 to 80 bits of the key recovered by this method depending on the model and stepping of Pentium 4 –Suspect exercising banking in the cache

9 9 First vulnerability  For given sequence,  Timings are erratic: Unlikely to get exactly the same timing  But statistically correlated: cache banking, operation chaining appears in the average

10 10 A possible counter measure for AES  Periodically and randomly change the mapping of the look up tables:  9000 cycles for this change: XOR based permutation: See Lauradoux et al  HAVEGE can provide the random numbers.

11 11 Indirect timing measures ?  Hypothesis:  The attacker has access to user mode on the system (legal or illegal)  The attacker has no access to your data  He/she can run concurently its process with the encryption  On conventional systems, no access to microscopic timing of your application:  Time slice in 1,000,000s cycles

12 12 Simultaneous Multithreading (SMT): parallel processing on a single processor  functional units are underused on superscalar processors  SMT:  Sharing the functional units on a superscalar processor between several process  Advantages:  Single process can use all the resources units  dynamic sharing of all structures on parallel/multiprocess workloads Second Vulnerability

13 13 Superscalar Issue slots SMT

14 14 Indirect timing measures on a SMT processor (principles) SPY wants to get information on CRYPT 1.SPY and CRYPT runs in parallel 2.SPY tracks a specific event on CRYPT:  For instance execution of a branch 3.SPY saturates hardware resources needed for this event by CRYPT for fast execution 4.SPY records its own execution time (reading the hardware clock counter):  Irregurality in its own execution time signals the event:  CRYPT has try to grab the hardware resource

15 15 Indirect timing measures on a SMT proof of concept (derived from SBPA) The skeleton of a naive RSA core For I =1 to N Sequence X // 1,000s of cycles If Key[I]=1 Sequence Y // 1,000s of cycles Endfor Spy this branch B

16 16 Indirect timing measures on a SMT proof of concept (2)  Branch instructions are buffered in a BTB:  On Pentium 4, when the branch misses in the BTB, more than 20 cycles penalty  SPY: nearly infinite loop iterating on branching over a set of branches occupying the possible entries for B  Track irregularities in the timing of the loop: When B is executed, a branch of the SPY is ejected from the BTB, thus creating a timing irregularity: –Iteration is X-type or XY-type Able to reproduce this attack on a toy example

17 17 Indirect timing measures on a SMT  Feasible:  On a branch on Pentium4 HT, information is leaking: I recovered all the bits of 32 bits key in a single run (on a toy example)  Same kind of attack may apply for cache access: memory access sequence could be discovered

18 18 Feasible, but difficult  Technically, very difficult:  Lack of documentation on the BTB Strange indexing, unknown associativity, BTB hierarchy  Requires relatively infrequent events: 1,000s cycles frequency: measure resolution is in the 100s cycles resolution

19 19 So what ?  On Pentium 4 HT:  If key bits control branches (or addresses of loads): Might be recovered by a spy thread

20 20 Countermeasures  Just deactivate Hyperthreading.  At present that is a global OS mode (boot time)  Rework implementation:  Introduce randomness in control path at execution ? Makes attack much more complex


Download ppt "1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team."

Similar presentations


Ads by Google