Microarchitectural Side-Channel Attacks Yuval Yarom The University of Adelaide and Data61 1.

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Side-Channel Attack on OpenSSL ECDSA Naomi BengerJoop van de Pol Nigel Smart Yuval Yarom 1.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Memory Management 2010.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
Computer Organization and Architecture
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Topic 1: Introduction to Computers and Programming
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
ITEC 325 Lecture 29 Memory(6). Review P2 assigned Exam 2 next Friday Demand paging –Page faults –TLB intro.
Microprocessor-based systems Curse 7 Memory hierarchies.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Chapter 4 Memory Management Virtual Memory.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
Side-channels Tom Ristenpart CS Pick target(s) Choose launch parameters for malicious VMs Each VM checks for co-residence Frequently achieve advantageous.
Computer Architecture Lecture 27 Fasih ur Rehman.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
CS 3204 Operating Systems Godmar Back Lecture 16.
Chapter 8: Main Memory. 8.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 22, 2005 Memory and Addressing It all starts.
Lecture 4 Page 1 CS 111 Online Modularity and Memory Clearly, programs must have access to memory We need abstractions that give them the required access.
Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.
Welcome to Intro to Operating Systems Course Website:
Introduction to Operating Systems Concepts
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Computer Organization
Non Contiguous Memory Allocation
Protecting Memory What is there to protect in memory?
Lecture: Large Caches, Virtual Memory
The Goal: illusion of large, fast, cheap memory
Session 3 Memory Management
Outline Paging Swapping and demand paging Virtual memory.
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Information Security – Theory vs
Overview Introduction General Register Organization Stack Organization
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Architecture Background
Modularity and Memory Clearly, programs must have access to memory
Lu Peng, Jih-Kwon Peir, Konrad Lai
Bruhadeshwar Meltdown Bruhadeshwar
Chapter 10 The Stack.
Chapter 8: Main Memory.
Chapter 9 :: Subroutines and Control Abstraction
CSCI1600: Embedded and Real Time Software
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Objective of This Course
Meltdown CSE 351 Winter 2018 Instructor: Mark Wyse
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Memory Management Overview
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory: Working Sets
Chapter 2 Operating System Overview
Landon Cox January 17, 2018 January 22, 2018
COMP755 Advanced Operating Systems
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
CSCI1600: Embedded and Real Time Software
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
Meltdown & Spectre Attacks
Chapter 4 The Von Neumann Model
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

Microarchitectural Side-Channel Attacks Yuval Yarom The University of Adelaide and Data61 1

Publications on microarchitectural timing attacks Data from [GYCH16] 2

Publications and Global Temperature 3

Global Average Temperatures Vs. Number of Pirates 4 Image from :

Pirates and publications on microarchitectural timing attacks 5 Pirate image By J.J. at the English language Wikipedia, CC BY-SA 3.0,

Still considered hard OpenSSL LOW Severity. This includes issues such as those that … or hard to exploit timing (side channel) attacks. At the same time – Publications are terse – technical details are often omitted – Generic tools do not exist 6

Still considered hard OpenSSL LOW Severity. This includes issues such as those that … or hard to exploit timing (side channel) attacks. Attacks are easy, but at the same time – Publications are terse – technical details are often omitted – Generic tools do not exist 7

Motivation Reduce barriers to entry Why? – Offensive research A potential leak is nice. An exploit is better – Cipher development Know your enemy How? – Education – this tutorial – Tools – Mastik 8

Mastik Extremely bad acronym for Micro-Architectural Side-channel ToolKit Aims – Collate information on SC attacks Improve our understanding of the domain Provide somewhat-robust implementations of all known SC attack techniques for every architecture Implementation of generic analysis techniques – Overcome the barrier to entry into the area – Shift focus to cryptanalysis 9

Mastik - Status Reasonably solid implementation of four attacks – Prime+Probe on L1-D, L1-I and L3, Flush+Reload Only Intel x86-64, on Linux and Mac – x86-32 and limited ARM currently working in the lab Zero documentation, little testing No user feedback 10

Outline Background on and a taxonomy of microarchitectural side-channel attacks The Flush+Reload attack and variants The Prime+Probe attack Countermeasures Slides and sources available at

The Microarchitecture An (Instruction Set) Architecture (ISA) can have multiple implementations – A microarchitecture is one such implementation – The ISA abstracts the implementation detail The microarchitecture is functionally transparent – But contains hidden state that can be observed through program execution timing 12

The Microarchitecture ISA Software Hardware Image adapted from [GYCH16] 13

The Microarchitecture An (Instruction Set) Architecture (ISA) can have multiple implementations – A microarchitecture is one such implementation – The ISA abstracts the implementation detail The microarchitecture is functionally transparent But contains hidden state that can be observed through program execution timing 14

Microarchitectural attacks Create contention on microarchitectural components Which results in timing variations That are used to expose an intarnal state Which depends on secret data Allowing the attacker to infer said data 15

Attack taxonomy - level ISA Software Hardware Core shared Package shared System shared 16 Image adapted from [GYCH16]

Attack taxonomy – type [AS07] Persistent-state attacks – Spatial contention on limited storage space – Example: most cache attacks Transient-state attacks – Temporal contention on limited processing speed – Example: CacheBleed 17

Basic taxonomy We focus on persistent-state at the package level (with some core) Persistent-stateTransient-state Core level[Ber05,Per05, OST06] Others [Lam73,Ber05, AS07,YGH16] Package[YF14,LYG+15,IES15]? System[PGM+16,IES16][WS12,WXW12] 18

Other classifications Classical taxonomy [Pag03], [NS06] – Time-driven – measure complete execution time – Trace-driven – capture sequence of cache hits/misses – Access-driven – obtain some information on accessed memory addresses Internal vs. external contention [AK09] Degree of concurrency [GYCH16] – Multicore – Hyperthreading – Time slicing 19

History Lampson [Lam73] – Defines covert channels – Suggests the first micro-architectural timing channel: 20

History Page [Pag02] – First theoretical cache-based attack Tsunoo et al. [TTMM02] – First microarchitectural side-channel attack – Cache-based timing attack on Misty Mostly forgotten Tsunoo et al. [TSS+03] – Cache-based timing attack on DES 21

The (X86) Cache Memory is slower than the processor The cache utilises locality to bridge the gap – Divides memory into lines – Stores recently used lines Shared caches improve performance for multi-core processors Processor Memory Cache 22

Cache Consistency Memory and cache can be in inconsistent states – Rare, but possible Solution: Flushing the cache contents – Ensures that the next load is served from the memory Processor Memory Cache 23

The F LUSH +R ELOAD Technique Exploits cache behaviour to leak information on victim access to shared memory. Spy monitors victim’s access to shared code – Spy can determine what victim does – Spy can infer the data the victim operates on 24

Detour - Virtual Memory Processes execute within a virtual address space – Virtual pages map to physical frames 25

Sharing Frames can be shared by multiple processes – Read only sharing maintains functional isolation – Protection using Copy-on-write 26

Causes of sharing Content-aware sharing – Pages from the same file have identical content – Shared program or library code Can also share constant data – Shared images in PaaS clouds Content-based sharing (a.k.a. page deduplication) – The system identifies and coalesces identical pages – Implemented in many hypervisors and in most modern operating systems 27

F LUSH +R ELOAD [GBK11,YF14] F LUSH memory line Wait a bit Measure time to R ELOAD line – slow-> no access – fast-> access Repeat Processor Memory Cache 28

Flush+Reload code mfence rdtscp mov %eax, %esi mov (%ebx), %eax rdtscp sub %esi, %eax clflush 0(%ebx) Also need: – Wait – Data collection – Noise handling – Initial parsing 29

Demo FR-1-file-access FR-2-file-access FR-threshold 30

Finding code Use nm, gdb, objdump, etc. – Demo scripts functiondump.sh and debuginfo.sh Remember the base address 31

Demo FR-function-call 32

A closer look at F+R Time flush wait wait Reload flush wait wait Reload flush wait wait wait 33

A closer look at F+R Time flush wait wait Reload flush wait wait flush wait wait wait Spy Victim working access 34

Timing matters Time flush wait wait Reload flush wait wait flush wait wait wait Spy Victim working access Reload Access missed due to temporal overlap 35

Demo FR-function-call-nodelay 36

Probability of a probe miss [ABF+15] Ratio of wait time to slot length 37

Handling misses Probe function calls [YB14] – A cache line containing a function call is accessed once before the call and once on return Except, maybe, when the return address is in the next cache line – The timings of the two accesses are not independent Probe loops [YF14] 38

GnuPG Modular Exponentiation x 1 for i |e|-1 downto 0 do x x 2 mod n if (e i =1) then x = xb mod n endif done return x 39 Example: 11 5 mod 100 = 161,051 mod 100 = 51 Operationresieiei Square12101 reduce12101 Multiply reduce Square reduce Square reduce Multiply reduce The secret exponent is encoded in the sequence of operations !!!

Demo FR-gnupg

F+R Spatial Resolution Cache line – Can't have two different probes on the same cache line Cache line pairs – Probes on paired lines interfere with each other –Don't Streaming – Use "random" order when probing multiple lines in the same page – Don't probe too many of those Speculative execution – Probe rear end of functions 41

Improving temporal resolution Scheduler trick [GBK11] – Monitoring each and every memory access – Requires hyperthreadings – Uses hundreds of threads Amplification [ABF+15] – Flushing commonly used cache lines slows the victim 42

Demo FR-flush 43

Flush+Reload Summary Simple attack – Even without Mastik High temporal resolution – Up to sub-micro-second High accuracy – Few false positives Increases as a function of spatial granularity – A bit more false negatives Particularly when exploiting the high temporal resolution 44

Variants Evict+Reload [GSM15] – Uses cache contention instead of clflush Flush+Flush [GMWM16] – Measures variations in clflush timing between cached and non-cached data – Improved temporal and spatial resolution – Increased error rate Invalidate+Transfer [IES16] – Use Flush+Reload on Intel or Evict+Reload on ARM to implement a cross-package attack Flush+Reload on ARM [ZXZ16] – ??? 45