Landon Cox January 17, 2018 January 22, 2018 Meltdown and Spectre Landon Cox January 17, 2018 January 22, 2018
Understanding these attacks Building blocks Address spaces Speculative execution Cache side channels Attack targets Kernel memory Web browser sandboxes (e.g., javascript)
Speculative execution “Dynamic self-analysis” Allow thread to run into the future on faked data in parallel w/ retrieving the real data If faked data turns out to be same as real data, then you can use speculated results If faked data turns out to different from real data, then you proceed as normal Why is this approach appealing? Doesn’t require apps to be modified If you’re good at guessing the faked data, there can be huge performance benefits However … If you’re not good at guessing faked data, it’s huge waste of effort Also, the incorrect speculation must not produce visible side effects …
Speculative execution B B Seq.instructions Branch instruction Why are branch instructions relatively slow? Pipelined architectures utilize knowledge of the next instruction. On a branch, the next instruction may not be known.
Speculative execution B B Seq.instructions Branch instruction B Seq.instructions Branch instruction Compare speculation input to actual value =? Speculation
Speculative execution B B Seq.instructions Branch instruction B B Seq.instructions Branch instruction If speculation was wrong, discard state. != Speculation
Speculative execution B B Seq.instructions Branch instruction B Seq.instructions Branch instruction =? Speculation
Speculative execution B B Seq.instructions Branch instruction B B Seq.instructions Branch instruction If speculation was correct, swap in state. == Speculation
Speculative execution B B Seq.instructions Branch instruction B B Seq.instructions When speculation is correct, we can get a speed up. Branch instruction Speed up Speculation
Speculative execution B B Seq.instructions Branch instruction B B Seq.instructions Branch instruction A few things have to be true of the speculative execution in case it is wrong… != Speculation
How good speculation went bad Speculation modifies the processor cache Changes to the cache are visible when speculation is wrong Speculation runs without normal protections For example, a speculative thread will operate on page mappings For Meltdown, accessing mappings may ignore protections
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Secret is cached. K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ array1_size and array2 are not cached. $ array2 Victim’s virtual memory
Exploiting branch misprediction Attacker controls the value of x K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Attacker trains CPU to predict x < array1_size K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Primary thread accesses array1_size, causing a cache miss K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Attacker chooses x so that array1[x] lands on secret K $ secret x if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Attacker chooses x so that array1[x] lands on secret K $ secret x if (x < array1_size) { y = array2[k * 256]; } array1 $ array1_size $ $ array2 Victim’s virtual memory
Exploiting branch misprediction Speculative thread reads from address array2[k * 256], which causes a cache miss K $ secret if (x < array1_size) { y = array2[k * 256]; } array1 $ array1_size $ $ k * 256 $ array2 array2 Victim’s virtual memory
Exploiting branch misprediction Meanwhile, main thread realizes that prediction was wrong K $ secret if (x < array1_size) { y = array2[k * 256]; } array1 $ array1_size $ $ k * 256 $ array2 array2 Victim’s virtual memory
Exploiting branch misprediction But! Value of array2[k * 256] is now in the cache. K $ secret if (x < array1_size) { y = array2[k * 256]; } array1 $ array1_size $ $ k * 256 $ array2 array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret Attacker can iterate through values of array2 to recover the secret if it can access array2. if (x < array1_size) { y = array2[k * 256]; } array1 for (i=0; i<N; i++) { time1=getTime(); z = array2[i * 256]; time2=getTime(); if (time2-time1 > BOUND) print “secret = “ + i; } $ array1_size $ $ $ array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ Why was it critical for array2 to be uncached? array1_size $ $ Allowed attacker to see which entry was cached after misprediction array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ Why was it critical for array1_size to be uncached? array1_size $ $ Miss allowed speculative thread time to run ahead of main thread. array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ Why was it critical for secret to be cached? array1_size $ $ Miss on secret would be slow and prevent read of array2[k*256] before main thread finished array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ Does it matter whether array1 is cached or not? array1_size $ $ No, since the attack doesn’t actually read from array1 array2 Victim’s virtual memory
Exploiting branch misprediction K $ secret if (x < array1_size) { y = array2[array1[x] * 256]; } array1 $ Why does the attack read in chunks of 256 bytes? array1_size $ $ x86 cache lines are typically 128 bytes … 256 to be safe? array2 Victim’s virtual memory
How good speculation went bad Why would I want to attack my own address space? Lots of code runs in a managed runtime, e.g., javascript Assumption is that code cannot break out What kind of secrets might malicious javascript read? Browser tabs have their own address space Malicious javascript could read state from other websites E.g., login into google, then browse to mal.org in same tab
Example javascript
simpleByteArray acts as array1 probeTable acts as array2 Example javascript simpleByteArray acts as array1 probeTable acts as array2
Example javascript Like ”k * 256”
Attacking kernel memory 4GB Kernel data (same for all page tables) 3GB (0xc0000000) User data (different for every process) 0GB Virtual memory
Attacking kernel memory Why is this design extremely dangerous if a process could read kernel memory?
Attacking kernel memory In what settings might a malicious process want to read another process’s memory?
Attacking kernel memory What is an example exception?
Attacking kernel memory Explain why line 3 could still be executed.
Attacking kernel memory Load byte value at kernel address into least significant byte of RAX register represented by AL
Attacking kernel memory This will trigger an exception but it will also run in parallel with subsequent instructions
Attacking kernel memory No part of our probe array can be cached.
Attacking kernel memory Multiply the secret kernel value by the page size (4KB or 0xc)
Attacking kernel memory Retry if the multiplied value is zero (we’ll come back to this)
Attacking kernel memory If the multiplied value is non-zero, then index into our probe array with the multiplied value
Attacking kernel memory But how do we read this array after the process is killed? Map probe array into partner process. The probing process will die but the partner will survive.
Attacking kernel memory Why retry on zero? If exception is triggered while reading kernel memory, register value is zeroed out.
Attacking kernel memory Why retry on zero? Don’t want to falsely read zero when register holds it due to losing the race.
Attacking kernel memory How do we detect a true zero? If all of probe array remains uncached, true value was zero.
Attacking kernel memory How do we prevent this attack? Remove kernel mappings from address space except for exception handlers. (KAISER)
Next time A little history lesson THE by Edgar Dijkstra Send me your groups if you haven’t already