Bruhadeshwar bru.bezawada@colostate.edu Meltdown Bruhadeshwar bru.bezawada@colostate.edu.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
CS6290 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Memory Management (II)
Chapter 12 Pipelining Strategies Performance Hazards.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
Modified from Silberschatz, Galvin and Gagne Lecture 15 Chapter 8: Main Memory.
Chapter 12 CPU Structure and Function. Example Register Organizations.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
CMPE 421 Parallel Computer Architecture
Virtual Memory Expanding Memory Multiple Concurrent Processes.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
PipeliningPipelining Computer Architecture (Fall 2006)
CS203 – Advanced Computer Architecture ILP and Speculation.
Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.
CS 352H: Computer Systems Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Architecture Chapter (14): Processor Structure and Function
Memory Caches & TLB Virtual Memory
Section 9: Virtual Memory (VM)
William Stallings Computer Organization and Architecture 8th Edition
Section 9: Virtual Memory (VM)
/ Computer Architecture and Design
Today How was the midterm review? Lab4 due today.
Computer Structure Multi-Threading
5.2 Eleven Advanced Optimizations of Cache Performance
CS203 – Advanced Computer Architecture
Architecture Background
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Pipeline Implementation (4.6)
CSE 153 Design of Operating Systems Winter 2018
Chapter 8: Main Memory.
Morgan Kaufmann Publishers The Processor
O.S Lecture 13 Virtual Memory.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Superscalar Processors & VLIW Processors
Page Replacement.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Meltdown CSE 351 Winter 2018 Instructor: Mark Wyse
Simultaneous Multithreading in Superscalar Processors
Simulation of computer system
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Lecture 20: OOO, Memory Hierarchy
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSC3050 – Computer Architecture
ARM ORGANISATION.
Lecture 8: Efficient Address Translation
Programming with Shared Memory Specifying parallelism
CSE 153 Design of Operating Systems Winter 2019
Landon Cox January 17, 2018 January 22, 2018
Meltdown & Spectre Attacks
Introduction to CUDA Programming
Presentation transcript:

Bruhadeshwar bru.bezawada@colostate.edu Meltdown Bruhadeshwar bru.bezawada@colostate.edu

Motivation Meltdown is security attack on user data mapped to kernel memory Meltdown uses out-of-order execution feature of micro-processors to extract kernel data The attack is independent of OS vulnerabilities Meltdown is possible through a combination of Cache side-channel inference Out-of-order execution The attack is applicable in almost all processors manufactured after 2010

Background: Instruction Pipelining Pipelining is an optimization technique to keep the processor busy Parts of the instruction cycle: Fetch, Decode, Execute Read memory, Write back can be executed independently

Background: Speculative execution This is the process of executing an instruction in anticipation that it might be used This provides instruction level parallelism For instance in a conditional statement, both the instructions in a condition might be executed Depending on the condition, only one of the execution is retired or committed The other instruction is flushed from the pipeline Tomasulo algorithm describes these details

Background: Out-of-order Execution Multiple instructions executed regardless of the actual program order lw $3, 100($4) in execution, cache miss sub $5, $6, $7 can execute during the cache miss add $2, $3, $4 waits until the miss is satisfied The instructions are fetched, decoded and prepared for execution and pushed into a reservation station Those instructions without data dependencies will execute in no specific order Others will stall until dependency is satisfied (resolving data hazards)

Background: Page Sharing Sharing of pages across un-trusted processes is common Computing systems use copy-on-write mechanisms to separate access More importantly, when shared page is accessed from memory, that page is cached This allows for side-channel attacks on caches A spy process can monitor access to a shared page and use timing information as side-channel

Background: Memory Hierarchy Intel Core-i5 Memory Architecture

Meltdown Attack Combination of two steps Cache side-channel Attack : Flush + Reload Out-of-order Execution

Cache Side-channel Attack A spy process evicts the shared page from the cache using clflush instruction The victim process is allowed to execute some time The spy process checks if the victim process loaded those pages again If access to these pages is seen, then the spy makes inferences about the execution sequence The Flush+Reload is most efficient attack

Flush+Reload Three steps + a pre-processing mmap step First: Flush the shared cache line from memory using clflush Second: Wait for some time Third: Reload the memory line If the victim access the memory during waiting period, the page is in cache and will load faster If not, the loading is done from memory and hence, longer The timing difference is noted by the spy

Ex: Extracting RSA Private Keys Attack Intuition Exponentiation Algorithm Exponentiation with square- and-multiply The message is raised to the power of private key Depending on bit value the code execution time varies If key bit is 1 then the if condition is true Inference: The pages containing square, reduce, multiply, reduce are executed when private key bit is 1

Flush+Reload Attack Code The code measures the time for loading the page The clflush keeps removing the monitored pages from memory The other commands are to ensure that this code executes in-order The final output is the time taken to load the corresponding page into memory after a waiting period

Access Time

Meltdown User pages are separated using page table addresses in virtual memory But, entire Physical memory is mapped in the kernel as kernel needs to have access to both kernel pages and user pages The offsets of kernel code are typically known ASLR for kernel is KASLR, but not implemented This is exploited by meltdown Such instructions are called transient instructions

Meltdown Speculative execution executes second line Sample example Description Speculative execution executes second line The array location is accessed and brought into the cache

Meltdown : Architecture

Meltdown : Core Sequence Line 1 reads an inaccessible address Should throw an exception but is delayed due to out-of- order execution Line 4 copies the byte in the kernel address into al This should not happen but does happen! The register al is rax Line 5 multiplies the address with 4096 Line 8 accesses the location pointed by rbx+rax

Meltdown The location accessed by the code is cached The attackers uses Flush+Reload to mark the cache access timings Next, the attacker probes for all the pages in array Only the cached line will be retrieved faster The page number of this cache line corresponds to the kernel data Hence, complete!

Meltdown : Access Time

Exceptions What happens to exceptions thrown due to the invalid kernel access? Can be handled by forking or exception catching

Mitigation Disable out-of-order execution Use page access check like AMD Do not map all user space into kernel Disable clflush?  other side-channel attacks might be still possible

Summary Meltdown is a cache side-channel attack affecting nearly every computer The hardware choices make this attack possible Open challenges: How to keep performance high while not allowing such side-channels? How to discover such side-channels?