A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev,

Slides:

Advertisements

Similar presentations

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

Advertisements

High Performing Cache Hierarchies for Server Workloads

Achieving Non-Inclusive Cache Performance with Inclusive Caches

Prefetch-Aware Cache Management for High Performance Caching

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

AES clear a replacement for DES was needed

Full AES key extraction in 65 milliseconds using cache attacks

Cryptography and Network Security (AES) Dr. Monther Aldwairi New York Institute of Technology- Amman Campus 10/18/2009 INCS 741: Cryptography 10/18/20091Dr.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Memory Management Ch.8.

Chapter 5 –Advanced Encryption Standard "It seems very simple." "It is very simple. But if you don't know what the key is it's virtually indecipherable."

Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.

Sampling Dead Block Prediction for Last-Level Caches

BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches

Ensemble Learning for Low-level Hardware-supported Malware Detection

The Evicted-Address Filter

Scavenger: A New Last Level Cache Architecture with Global Block Priority Arkaprava Basu, IIT Kanpur Nevin Kirman, Cornell Mainak Chaudhuri, IIT Kanpur.

1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.

University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)

Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.

1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.

Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.

Covert Channels Through Branch Predictors: a Feasibility Study

X. Zhang, Y. Xiao, Y. Zhang Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices Xiaokuan Zhang, Yuan Xiao, Yinqian.

CMSC 611: Advanced Computer Architecture

Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

ECE232: Hardware Organization and Design

Memory Caches & TLB Virtual Memory

Lecture: Large Caches, Virtual Memory

Section 9: Virtual Memory (VM)

Today How was the midterm review? Lab4 due today.

Module: Virtual Memory

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Mengjia Yan, Yasser Shalabi, Josep Torrellas

Consider a Direct Mapped Cache with 4 word blocks

Lecture: Large Caches, Virtual Memory

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks

Bruhadeshwar Meltdown Bruhadeshwar

Secure In-Cache Execution

CSE 153 Design of Operating Systems Winter 2018

Prefetch-Aware Cache Management for High Performance Caching

CSCI206 - Computer Organization & Programming

Professor, No school name

Using Dead Blocks as a Virtual Victim Cache

Introduction to CUDA Programming

FIGURE 12-1 Memory Hierarchy

Chapter 5 Memory CSE 820.

ConfMVM: A Hardware-Assisted Model to Confine Malicious VMs

Lecture 22: Cache Hierarchies, Memory

Side channels and covert channels Part I – Architecture side channels

(A Research Proposal for Optimizing DBMS on CMP)

Virtual Memory Overcoming main memory size limitation

Virtual Memory Prof. Eric Rotenberg

Cross-Core Prime+Probe Attacks on Non-inclusive Caches

Page Table Implementations

CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.

CSE 153 Design of Operating Systems Winter 2019

Lecture 9: Caching and Demand-Paged Virtual Memory

Sarah Diesburg Operating Systems CS 3430

University of Illinois at Urbana-Champaign

MicroScope: Enabling Microarchitectural Replay Attacks

Sarah Diesburg Operating Systems COP 4610

Meltdown & Spectre Attacks

Introduction to CUDA Programming

Presentation transcript:

A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev, State University of New York at Binghamton Aamer Jaleel, Nvidia Research The 53 rd Design Automation Conference (DAC), Austin, TX, June 8, 2016

Cache Side-Channel 2 f2 85 5c 06 6a 91 4e 0c c4 fc da a8 d5 37 e9 9c 28 1e 4c bf f 53 d9 a4 49 2d 0e S-Box SubBytes Set-associative cache ways sets

Flush+Reload Attack 3 Cache L2 L1-I Victim 1- Flush each line in the critical data 2- Victim accesses critical data 3- Reload critical data (measure time) EvictedTime CPU1 ways sets L1-D L2 L1-I Attacker L1-D Shared L3 CPU2

Prime+Probe: L1 Attack 4 L1 Cache ways sets L2 L1-IL1-D AttackerVictim 2-way SMT core 1- Prime each cache set 2- Victim accesses critical data 3- Probe each cache set (measure time) EvictedTime

Prime+Probe: LLC Attack 5 L2 L1-I Victim 2- Victim accesses critical data CPU1 L1-D L2 L1-I Attacker L1-D Shared L3 CPU2 1- Prime each cache set 3- Probe each cache set (measure time) Inclusive Evict critical data Back-invalidations Challenges: Find collision groups for each cache set Discover hardware details Identify a minimal set of addresses per cache set Find which are the critical cache sets Find which cache sets incur the most slowdown for the victim Among those, look for the expected access pattern

Discovering LLC Details 6 Intel Sandy Bridge die 4x Cores 4x 2MB LLC Banks virtual page numberpage offset Virtual Address tag set index L1 Access line offset physical page numberpage offset Physical Address tag LLC Access set index 0 line offset 6 Hash bank select

Bank Selection and Cavity Sets 7 tag set index 0 line offset 6 ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ●●●● XOR H0H0 H1H1 : Number of ways: 1516 cavity sets

… number of ways 8 Finding Collision Groups Memory Page same set index x N = Cache Size N = (8 MB / 4 KB / 16) = 128 ɸ = { } Add page ρ to ɸ Measure ∆t = t( ɸ ) - t( ɸ - ρ ) If ∆t is high For each ρ i ∈ ɸ Measure ∆t i = t( ɸ ) - t( ɸ - ρ i ) Add ρ i to the new group if ∆t i is high Remove the group from ɸ Repeat until N groups are found ways N

9 Finding Critical Sets

10 Attack on Instructions for round = 1:9 if round is even /* even rounds */ else /* odd rounds */ /* last round */ sets time

11 Attack on Critical Table

12 Attack Analysis True Positive Rate # true critical accesses observed # all critical accesses of the victim False Discovery Rate (FDR) # false critical accesses observed # all measurements of the attacker Cache Side-Channel Vulnerability (CSV) CSV = Pearson-correlation (Attacker trace, Victim trace) TPR = FDR =

13 Comparison to Flush+Reload

14 Summary A new high-resolution Prime+Probe LLC attack is proposed It does not rely on large pages or the sharing of cryptographic data between the victim and the attacker Mechanisms to discover precise groups of addresses that map into the same LLC set in the presence of: Physical indexing Index hashing Varying cache associativity across the LLC sets Concurrent attack on the instruction and data tp improve the signal and reduce the noise Not limited to AES and can be applied to attacking any ciphers that rely on pre-computed cryptographic tables (e.g. Blowfish, Twofish)