Cross-Core Prime+Probe Attacks on Non-inclusive Caches

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Directory-Based Cache Coherence Marc De Melo. Outline Non-Uniform Cache Architecture (NUCA) Cache Coherence Implementation of directories in multicore.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain) MOSAIC :
Ragib Hasan Johns Hopkins University en Spring 2010 Lecture 3 02/15/2010 Security and Privacy in Cloud Computing.
Concurrent programming: From theory to practice Concurrent Algorithms 2014 Vasileios Trigonakis Georgios Chatzopoulos.
To Include or Not to Include? Natalie Enright Dana Vantrease.
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
High Performing Cache Hierarchies for Server Workloads
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Achieving Non-Inclusive Cache Performance with Inclusive Caches
PIR-Tor: Scalable Anonymous Communication Using Private Information Retrieval Prateek Mittal University of Illinois Urbana-Champaign Joint work with: Femi.
Cache Optimization Summary
Thoughts on Shared Caches Jeff Odom University of Maryland.
Cache Definition Cache is pronounced cash. It is a temporary memory to store duplicate data that is originally stored elsewhere. Cache is used when the.
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Defining Anomalous Behavior for Phase Change Memory
Architecture for Protecting Critical Secrets in Microprocessors Ruby Lee Peter Kwan Patrick McGregor Jeffrey Dwoskin Zhenghong Wang Princeton Architecture.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
By Islam Atta Supervised by Dr. Ihab Talkhan
Improving Cache Performance using Victim Tag Stores
Virtual Memory Chapter 7.4.
Lecture: Large Caches, Virtual Memory
ASR: Adaptive Selective Replication for CMP Caches
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture: Cache Hierarchies
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
(Find all PTEs that map to a given PPN)
Intel’s Core i7 Processor
Dissemination-based Data Delivery Using Broadcast Disks
Lecture: Large Caches, Virtual Memory
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Lecture: Cache Hierarchies
Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas
Bruhadeshwar Meltdown Bruhadeshwar
Prefetch-Aware Cache Management for High Performance Caching
Lecture 13: Large Cache Design I
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Basic Performance Parameters in Computer Architecture:
Professor, No school name
Example Cache Coherence Problem
Lecture 12: Cache Innovations
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Lecture 2: Snooping-Based Coherence
Lecture 1: Parallel Architecture Intro
ECE Dept., University of Toronto
Lecture: Cache Innovations, Virtual Memory
User-mode Secret Protection (SP) architecture
Lecture 15: Large Cache Design III
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Lecture: Cache Hierarchies
CSE 486/586 Distributed Systems Cache Coherence
Lei Zhao, Youtao Zhang, Jun Yang
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
SecDir: A Secure Directory to Defeat Directory Side-Channel Attacks
Presentation transcript:

Cross-Core Prime+Probe Attacks on Non-inclusive Caches Mengjia Yan, Read Sprabery, Bhargava Gopireddy, Christopher Fletcher, Roy Campbell, Josep Torrellas University of Illinois at Urbana-Champaign

Modern Cache Hierarchies Modern systems are moving to non-inclusive cache hierarchies Latest Intel server processor uses non-inclusive caches Existing conflict-based attacks do not work on sliced non-inclusive caches Skylake-S (Sep 2015) Skylake-X/Skylake-SP (Jun 2017) L2 256KB/core 16-way, inclusive 1MB/core LLC 2MB/core 1.375MB/core 11-way, non-inclusive Core 0 LLC Slice 0 Slice 4 Core 4 Core 1 Slice 1 Slice 5 Core 5 Core 2 Slice 2 Slice 6 Core 6 Core 3 Slice 3 Slice 7 Core 7

Challenges of Prime+Probe Attacks Lack of Visibility into the Victim’s Private Cache target address eviction addresses victim cache 0 evict an inclusion victim attacker cache 1 victim cache 0 attacker cache 1 private caches Victim’s line duplicates in L1 and L2 Victim’s line does not exist in L2 eviction set (EV) shared cache insert to LLC. cache conflict. insert to LLC. No conflict No inclusion victim (a) inclusive cache (b) non-inclusive cache

Challenges of Prime+Probe Attacks Eviction Set Construction is Hard target address eviction address private caches also insert to private cache evict to shared cache insert to private cache shared cache Need LLC conflicts on the target slice to further evict the target line. both evict to DRAM slice 0 slice 1 slice 0 slice 1 insert to shared cache (b) non-inclusive cache (a) inclusive cache Eviction is affected by the replacement policies in multiple caches, and address slice distributions. Eviction is only determined by the LLC replacement policy.

Contributions We develop an algorithm to create Eviction Set on sliced non-inclusive caches. We reverse engineer the directory structure in Intel Skylake-X processors. We identify that directory as a unified structure to bootstrap conflict-based cache attacks for different cache hierarchies. Based on our insights into the directory, we design the first Prime+Probe attack on sliced non-inclusive LLCs. Previous attacks on inclusive caches are an example of directory attack.

The Inclusive Directory Structure Insight: Directory must be inclusive to maintain tracking information for all the cache lines resident in the cache hierarchies. directory entry for lines in LLC directory entry for lines in L2 but not LLC Attack opportunity analysis: 𝑊 𝐸𝐷 =12 < 𝑊 𝐿2 =16 LLC Slice Directory and Tags Cache Lines Due to the associativity difference, we can create ED conflicts. Can ED conflicts lead to inclusion victims? …… …… Traditional Directory …… Extended Directory (ED) The new attack surface!

Creating Inclusion Victims via ED Conflicts core 0 attacker core 1 cache line directory entry L2 …… …… target address evict the target line from remote L2 to LLC probe addresses directories cache lines …… …… traditional directory inclusion victim LLC slice extended Directory (ED) …… Prime: access 𝑊 𝐸𝐷 probe lines to occupy the target set in a ED slice insert into L2 and ED. ED conflict. …… ……

Prime+Probe Attacks Targeting the Directory victim core 0 attacker core 1 cache line directory entry L2 …… …… target address probe addresses directories cache lines Prime: access 𝑊 𝐸𝐷 probe lines to occupy the target set in a ED slice Wait: wait for the victim to perform an access Probe: re-access the 𝑊 𝐸𝐷 probe lines and measure access latency …… …… traditional directory LLC slice extended Directory (ED) …… victim does not perform access  Probe latency is short

Prime+Probe Attacks Targeting the Directory victim core 0 attacker core 1 cache line directory entry L2 …… …… target address ① victim accesses target line probe addresses directories cache lines Prime: access 𝑊 𝐸𝐷 probe lines to occupy the target set in a ED slice Wait: wait for the victim to perform an access Probe: re-access the 𝑊 𝐸𝐷 probe lines and measure access latency …… …… traditional directory LLC slice extended Directory (ED) …… ②directory entry migration. ED conflict. Victim performs access  Probe latency is higher

Conclusion More in the Paper Directory = The unified structure for conflict-based cache attacks “Attack Directories, Not Caches: Side-Channel Attacks in a Non-Inclusive World” recently accepted in IEEE Symposium on Security and Privacy (SP’19). More in the Paper Eviction set construction algorithm. Steps of reverse engineering the directory structure. Root cause analysis of the the vulnerability A multi-threaded high-bandwidth Evict+Reload attack. Attack results on AMD machines.