A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev,

Slides:



Advertisements
Similar presentations
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Advertisements

High Performing Cache Hierarchies for Server Workloads
Achieving Non-Inclusive Cache Performance with Inclusive Caches
Prefetch-Aware Cache Management for High Performance Caching
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
AES clear a replacement for DES was needed
Full AES key extraction in 65 milliseconds using cache attacks
Cryptography and Network Security (AES) Dr. Monther Aldwairi New York Institute of Technology- Amman Campus 10/18/2009 INCS 741: Cryptography 10/18/20091Dr.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Memory Management Ch.8.
Chapter 5 –Advanced Encryption Standard "It seems very simple." "It is very simple. But if you don't know what the key is it's virtually indecipherable."
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
Sampling Dead Block Prediction for Last-Level Caches
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
Ensemble Learning for Low-level Hardware-supported Malware Detection
The Evicted-Address Filter
Scavenger: A New Last Level Cache Architecture with Global Block Priority Arkaprava Basu, IIT Kanpur Nevin Kirman, Cornell Mainak Chaudhuri, IIT Kanpur.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.
Covert Channels Through Branch Predictors: a Feasibility Study
X. Zhang, Y. Xiao, Y. Zhang Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices Xiaokuan Zhang, Yuan Xiao, Yinqian.
CMSC 611: Advanced Computer Architecture
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
ECE232: Hardware Organization and Design
Memory Caches & TLB Virtual Memory
Lecture: Large Caches, Virtual Memory
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
Module: Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Consider a Direct Mapped Cache with 4 word blocks
Lecture: Large Caches, Virtual Memory
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Bruhadeshwar Meltdown Bruhadeshwar
Secure In-Cache Execution
CSE 153 Design of Operating Systems Winter 2018
Prefetch-Aware Cache Management for High Performance Caching
CSCI206 - Computer Organization & Programming
Professor, No school name
Using Dead Blocks as a Virtual Victim Cache
Introduction to CUDA Programming
FIGURE 12-1 Memory Hierarchy
Chapter 5 Memory CSE 820.
ConfMVM: A Hardware-Assisted Model to Confine Malicious VMs
Lecture 22: Cache Hierarchies, Memory
Side channels and covert channels Part I – Architecture side channels
(A Research Proposal for Optimizing DBMS on CMP)
Virtual Memory Overcoming main memory size limitation
Virtual Memory Prof. Eric Rotenberg
Cross-Core Prime+Probe Attacks on Non-inclusive Caches
Page Table Implementations
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
CSE 153 Design of Operating Systems Winter 2019
Lecture 9: Caching and Demand-Paged Virtual Memory
Sarah Diesburg Operating Systems CS 3430
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
Sarah Diesburg Operating Systems COP 4610
Meltdown & Spectre Attacks
Introduction to CUDA Programming
Presentation transcript:

A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev, State University of New York at Binghamton Aamer Jaleel, Nvidia Research The 53 rd Design Automation Conference (DAC), Austin, TX, June 8, 2016

Cache Side-Channel 2 f2 85 5c 06 6a 91 4e 0c c4 fc da a8 d5 37 e9 9c 28 1e 4c bf f 53 d9 a4 49 2d 0e S-Box SubBytes Set-associative cache ways sets

Flush+Reload Attack 3 Cache L2 L1-I Victim 1- Flush each line in the critical data 2- Victim accesses critical data 3- Reload critical data (measure time) EvictedTime CPU1 ways sets L1-D L2 L1-I Attacker L1-D Shared L3 CPU2

Prime+Probe: L1 Attack 4 L1 Cache ways sets L2 L1-IL1-D AttackerVictim 2-way SMT core 1- Prime each cache set 2- Victim accesses critical data 3- Probe each cache set (measure time) EvictedTime

Prime+Probe: LLC Attack 5 L2 L1-I Victim 2- Victim accesses critical data CPU1 L1-D L2 L1-I Attacker L1-D Shared L3 CPU2 1- Prime each cache set 3- Probe each cache set (measure time) Inclusive Evict critical data Back-invalidations Challenges: Find collision groups for each cache set Discover hardware details Identify a minimal set of addresses per cache set Find which are the critical cache sets Find which cache sets incur the most slowdown for the victim Among those, look for the expected access pattern

Discovering LLC Details 6 Intel Sandy Bridge die 4x Cores 4x 2MB LLC Banks virtual page numberpage offset Virtual Address tag set index L1 Access line offset physical page numberpage offset Physical Address tag LLC Access set index 0 line offset 6 Hash bank select

Bank Selection and Cavity Sets 7 tag set index 0 line offset 6 ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ●●●● XOR H0H0 H1H1 : Number of ways: 1516 cavity sets

… number of ways 8 Finding Collision Groups Memory Page same set index x N = Cache Size N = (8 MB / 4 KB / 16) = 128 ɸ = { } Add page ρ to ɸ Measure ∆t = t( ɸ ) - t( ɸ - ρ ) If ∆t is high For each ρ i ∈ ɸ Measure ∆t i = t( ɸ ) - t( ɸ - ρ i ) Add ρ i to the new group if ∆t i is high Remove the group from ɸ Repeat until N groups are found ways N

9 Finding Critical Sets

10 Attack on Instructions for round = 1:9 if round is even /* even rounds */ else /* odd rounds */ /* last round */ sets time

11 Attack on Critical Table

12 Attack Analysis True Positive Rate # true critical accesses observed # all critical accesses of the victim False Discovery Rate (FDR) # false critical accesses observed # all measurements of the attacker Cache Side-Channel Vulnerability (CSV) CSV = Pearson-correlation (Attacker trace, Victim trace) TPR = FDR =

13 Comparison to Flush+Reload

14 Summary A new high-resolution Prime+Probe LLC attack is proposed It does not rely on large pages or the sharing of cryptographic data between the victim and the attacker Mechanisms to discover precise groups of addresses that map into the same LLC set in the presence of: Physical indexing Index hashing Varying cache associativity across the LLC sets Concurrent attack on the instruction and data tp improve the signal and reduce the noise Not limited to AES and can be applied to attacking any ciphers that rely on pre-computed cryptographic tables (e.g. Blowfish, Twofish)