Microarchitectural Side-Channel Attacks Part 2 Yuval Yarom The University of Adelaide and Data61 1.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
16.317: Microprocessor System Design I
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Side-Channel Attack on OpenSSL ECDSA Naomi BengerJoop van de Pol Nigel Smart Yuval Yarom 1.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Exploring timing based side channel attacks against i CCMP Suman Jana, Sneha K. Kasera University of Utah Introduction
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
Full AES key extraction in 65 milliseconds using cache attacks
Memory Management 2010.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Systems I Locality and Caching
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
Lecture 19: Virtual Memory
Hardware Assisted Control Flow Obfuscation for Embedded Processors Xiaoton Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande HIDE: An Infrastructure.
IT253: Computer Organization
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
Exploiting Cache-Timing in AES: Attacks and Countermeasures Ivo Pooters March 17, 2008 Seminar Information Security Technology.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS203 – Advanced Computer Architecture Virtual Memory.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.
Microarchitectural Side-Channel Attacks Yuval Yarom The University of Adelaide and Data61 1.
A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev,
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
X. Zhang, Y. Xiao, Y. Zhang Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices Xiaokuan Zhang, Yuan Xiao, Yinqian.
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual memory.
Virtual Memory Chapter 7.4.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Memory COMPUTER ARCHITECTURE
Lecture: Large Caches, Virtual Memory
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Associativity in Caches Lecture 25
Section 9: Virtual Memory (VM)
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Chapter 9 – Real Memory Organization and Management
Information Security – Theory vs
5.2 Eleven Advanced Optimizations of Cache Performance
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Architecture Background
Cache Memory Presentation I
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Lecture: Cache Hierarchies
Bruhadeshwar Meltdown Bruhadeshwar
Professor, No school name
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Guoxing Chen1* & Wenhao Wang2,3*, Tianyu Chen2, Sanchuan Chen1,
Page Replacement.
Chapter 5 Memory CSE 820.
Yingmin Li Ting Yan Qi Zhao
Lecture: Cache Innovations, Virtual Memory
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
CS 3410, Spring 2014 Computer Science Cornell University
Cross-Core Prime+Probe Attacks on Non-inclusive Caches
Lecture: Cache Hierarchies
Lecture 8: Efficient Address Translation
Virtual Memory: Working Sets
CS703 - Advanced Operating Systems
Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

Microarchitectural Side-Channel Attacks Part 2 Yuval Yarom The University of Adelaide and Data61 1

X86 L1 Cache TagSetOffset Tag, Data Sets Ways Stores fixed-size (64B) lines Arranged as multiple (64) sets, each consisting of multiple (8) ways. Each memory line maps to a single cache set –Bits 6-11 select the set –Can be cached in any of the ways in the set 2

X86 L1 Cache Better visualised when the cache is rotated Memory Cache Sets Ways 3

Prime+Probe [OST05, Per05] Attacker chooses a cache- sized memory buffer Attacker accesses all the lines in the buffer, filling the cache with its data Victim executes, evicting some of the attackers lines from the cache Attacker measures the time to access the buffer – Accesses to cached lines is faster than to evicted lines Memory Cache 4

The gritty details The observer effect – Our code uses the cache - want to minimise our footprint The optimising compiler removes what it thinks is dead code – Not optimising increases the code's footprint – Solution: Know your optimiser Use assembly language 5

Thrashing The cache uses a Pseudo-LRU replacement Our probe + victim access can cause thrashing Solution [TOS10]: Zig-zag on data 6 S0S0 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 V0V0 S0S0 S1S1 S6S6 S7S7

Hardware prefetcher Aims to improve temporal locality Brings data to the cache before it is required – We do not want that!!! Solution [TOS10]: pointer chasing – [Per05] uses data dependency for achieving the same result 7

Data streams The cache aims to predict regular access patterns with fixed strides Linear access within a page may trigger the mechanism Solution [OST10]: Randomise order of probed sets 8

Putting it all together L1-capture With L1-rattle With rungnupg 9

Challenges to last-level cache attacks Difficulty in finding memory lines that map to a given cache set – Virtual memory – LLC slices Large cache size and longer cache access times mean LLC Prime+Probe is very slow Visibility of the victim memory access at the LLC – Intel inclusive cache takes care of the issue 10

Virtual vs. Physical addresses In L1, the cache stride is the same as the page size (4KiB) – The cache set is completely determined by the page offset Phys. Memory Cache Virt. addresses 11

Addressing uncertainty Phys. Memory Cache Virt. addresses In larger caches, the stride is bigger. Pages only contain lines of some of the cache sets. Parts of the mapping of memory to cache sets is masked by the virtual address. 12

Address fields 13 TagSetOffset L1 Cache Page numberPage offset Virtual address TagSetOffset LLC SandyBridge TagSetOffset LLC Skylake

Addressing uncertainty Phys. Memory Cache Virt. addresses The attacker cannot guarantee an even cover of the cache. Self-contention Missed victim access 14

Solution : Use large pages [LYG+15,IES15] 15 TagSetOffset L1 Cache Page numberPage offset Virtual address TagSetOffset LLC SandyBridge TagSetOffset LLC Skylake A feature of the MMU. Large pages (2MiB or bigger) reduce the overhead of address translation Large page numberLarge page offset Virtual address

Solution – large pages Phys. Memory Cache Virt. addresses 16

Intel LLC Slices The last-level cache is divided into slices One slice per core – Two in Skylake? The slice for a memory line is chosen using an undisclosed hash function 17 TagSetOffset LLC SandyBridge Hash

Reverse Engineering the Hash Function [MLN+15] Use performance counters to RE linear hash functions (number of cores is a power of two) 18

Reverse Engineering the Hash Function [YGL+15] use timing to RE the function for 6 cores 19

Reverse Engineering the Hash Function [MLN+15] Use performance counters to RE linear hash functions (number of cores is a power of two) [YGL+15] use timing to RE the function for 6 cores But – need physical addresses – Can be done on Linux < 4.0 – Unless running in a VM 20

Probing the hash function Start with a set of potentially conflicting memory lines – At least twice as many as the totalsize of the cache sets Expand a subset until it conflicts on a single set in a single slice Contract the subset until it contains only lines of the conflicted set Collect all of the lines of the conflicting set from the original set Repeat until the original set is (almost) empty 21

Expand – Start from an empty subset – Iteratively add lines to the subset as long as there is no self-eviction – Self-eviction is detected by priming a potential new member, accessing the current subset and timing another access to the potential new member 22

Contract – Iteratively remove lines from the subset checking for self- eviction – Only keep members if self-eviction disappears when removed 23

Collect – Scan original set, looking for members that conflict with the contracted subset 24

Demo L3-capture L3-capturecount 25

Slow LLC Prime+Probe times L1 (32KB) probe: – 64 sets * 8 ways *4 cycles = 2,048 cycles Small last-level cache (6MB): – 8,192 sets * 12 ways * ~30 cycles = ~3,000,000 cycles We cannot probe the entire LLC in a reasonable time, but probing one cache set is fast Our solution: – Probe one or a few cache sets at a time – Look for temporal patterns rather than spatial footprints 26

Demo L3-scan 27

Countermeasures Hardware System Software 28

Hardware countermeasure Don't share Hardware cache partitioning [DJL+12] – Intel Cache Allocation Technology Cache Randomisation [WL06] 29

System Countermeasures Limited sharing – Don't share memory – Don’t share cores Cache Colouring [BLRB94,SSCZ11] – STEALTHMEM [KPM12] CATalyst [LGY+16] CacheBar [ZRZ16] 30

Software Countermeasures Constant-time programming – No variable-time instructions – No secret-dependent flow control – No secret-dependent memory access 31

Non constant-time [YB14] 32

Constant-time 33

Constant-time 34

Bibliography [ABF+15]T. Allan, B. B. Brumley, K. Falkner, J. van de Pol and Y. Yarom, "Amplifying Side Channels Through Performance Degradation", ePrint 2015/1141 [AK09]O. Acıiçmez, Ç. K. Koç, "Microarchitectural Attacks and Countermeasures," in Ç. K. Koç, ed. Cryptographic Engineering, Springer US, 2009 [AS07]O. Acıiçmez and J-P Seifert, "Cheap Hardware Parallelism Implies Cheap Security", FDTC 2007 [Ber05]D. J. Bernstein, "Cache-timing attacks on AES", http://cr.yp.to/antiforgery/cachetiming pdf [DJL+12]L. Domnister, A. Jaleel, J, Loew, N. Abu-Ghazaleh and D. Ponomarev, "Non-Monopolizable Caches: Low- Complexity Mitigation of Cache Side Channel Attacks", TACO 8(4), 2012 [GBK11]D. Gullasch, E. Bangerter and S. Krenn, "Cache Games -- Bringing Access-Based Cache Attacks on {AES} to Practice", IEEE S&P 2011 [GYCH16]Q. Ge, Y. Yarom, D. Cock and G. Heiser, "A Survey of Microarchitectural Timing Attacks and Countermeasures on Contemporary Hardware", ePrint 2016/613 [GMWM16]D. Gruss, C. Maurice, K. Wagner and S. Mangard, "Flush+Flush: A Fast and Stealthy Cache Attack", DIMVA 2016 [GSM15]D. Gruss, R. Spreitzer and S. Mangard, "Cache Template Attacks: Automating Attacks on Inclusive Last- Level Caches", USENIX Security 2015 [IES15]G. Irazoqui, T. Eisenbarth and B. Sunar, "S$A: A Shared Cache Attack that Works Across Cores and Defies VM Sandboxing – and its Application to AES", IEEE S&P [IES16]G. Irazoqui, T. Eisenbarth and B. Sunar, "Cross processor cache attacks", AsiaCCS 2016 [IIES14]G. Irazoqui, M. S. Inci, T. Eisenberth and B. Sunar, "Wait a minute! A fast, Cross-VM Attack on AES", RAID 2014 [Lam73]B. W. Lampson, "A Note on the Confinement Problem", CACM 16(10), 1973 [LYG+15]F. Liu, Y. Yarom, Q. Ge, G. Heiser and R. B. Lee, "Last-Level Cache Side-Channel Attacks are Practical",, IEEE S&P 2015 [MLN+15]C. Maurice, N. Le Scourance, C. Neumann, O. Heen and A. Francillon, "Reverse Engineering Intel Last- Level Cache Complex Addressing Using Performance Counters", RAID 2015 [NS16]M. Neve and J-P Seifert, "Advances on access-driven cache attacks on AES," 13 th SAC,

Bibliography [OST06]D. A. Osvik, A. Shamir and E. Tromer, "Cache Attacks and Countermeasures: The Case of AES", CT-RSA 2006 [pag02]D. Page, "Theoretical Use of Cache Memory as a Cryptanalytic Side-Channel," ePrint 2002/169 [Pag03]D. Page, "Defending against cache-based side-channel attacks," Information Security Technical Report, 8(1), 2003 [Per05]C. Percival, "Cache Missing for Fun and Profit", BSDCan, 2005 [PGM+16]P. Pessl, D. Gruss, C. Maurice, M. Schwarz and S. Mangard, "DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks", USENIX Security 2016 [TOS10]E. Trome, D. A. Osvik, A. Shamir, "Efficient Cache Attacks on AES, and Countermeasures", JoC 23(1), 2010 [TTMM02] Y. Tsunoo, E. Tsujihara, K. Minematsu and H. Miyauchi, "Cryptanalysis of Block Ciphers Implemented on Computers with Cache", International Symposium on Information Theory and Its Applications, Oct [WL06]Z. Wang and R. B. Lee, "Covert and Side Channels Due to Processor Architecture", ACSAC 2006 [WS12]Y. Wang and G. E. Suh, "Efficient timing channel protection for on-chip networks", NoCS [WXW12]Z. Wu, Z. Xu and H. Wang, "Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud", USENIX Security 2012 [YB14]Y. Yarom and N. Benger, "Recovering OpenSSL ECDSA Nonces Using the FLUSH+RELOAD Cache Side- channel Attack", ePrint 2014/140 [YF14]Y. Yarom and K. Falkner, "FLUSH+RELOAD: a High Resolution, Low Noise, L3 Cache Side-Channel Attack", USENIX Security 2014 [YGH16]Y. Yarom, D. Genkin and N. Heninger, "CacheBleed: A Timing Attack on OpenSSL Constant Time RSA", CHES 2016 [YGL+15]Y. Yarom, Q. Ge, F. Liu, R.B. Lee and G. Heiser, "Mapping the Intel Last-Level Cache", ePrint 2015/905 [ZXZ16]X. Zhang, Y. Xiao and Y. Zhang "Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices", CCS