RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks

Slides:



Advertisements
Similar presentations
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.
Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors Chinnakrishnan S. Ballapuram Ahmad Sharif Hsien-Hsin S.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
High Performing Cache Hierarchies for Server Workloads
Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.
Memory Management 2010.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
IT 344: Operating Systems Winter 2008 Module 12 Page Table Management, TLBs, and Other Pragmatics Chia-Chi Teng CTB 265.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.
A High-Resolution Side-Channel Attack on Last-Level Cache Mehmet Kayaalp, IBM Research Nael Abu-Ghazaleh, University of California Riverside Dmitry Ponomarev,
Covert Channels Through Branch Predictors: a Feasibility Study
Translation Lookaside Buffer
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Software Coherence Management on Non-Coherent-Cache Multicores
Memory Caches & TLB Virtual Memory
Virtual Memory - Part II
Lecture: Large Caches, Virtual Memory
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
A Study on Snoop-Based Cache Coherence Protocols
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Lecture: Large Caches, Virtual Memory
Bruhadeshwar Meltdown Bruhadeshwar
CSE 153 Design of Operating Systems Winter 2018
Prefetch-Aware Cache Management for High Performance Caching
Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh
Professor, No school name
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CSE 451: Operating Systems Winter 2011 Page Table Management, TLBs, and Other Pragmatics Mark Zbikowski Gary Kimura 1.
Guoxing Chen1* & Wenhao Wang2,3*, Tianyu Chen2, Sanchuan Chen1,
Replacement Policies Assume all accesses are: Cache Replacement Policy
Lecture 2: Snooping-Based Coherence
Using Dead Blocks as a Virtual Victim Cache
ConfMVM: A Hardware-Assisted Model to Confine Malicious VMs
Lecture 5: Snooping Protocol Design Issues
Side channels and covert channels Part I – Architecture side channels
Virtual Memory Hardware
Translation Lookaside Buffer
RHMD: Evasion-Resilient Hardware Malware Detectors
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh
CSE 451: Operating Systems Autumn 2004 Page Tables, TLBs, and Other Pragmatics Hank Levy 1.
Cross-Core Prime+Probe Attacks on Non-inclusive Caches
Lecture 8: Efficient Address Translation
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
CSE 153 Design of Operating Systems Winter 2019
CSE 451: Operating Systems Winter 2012 Page Table Management, TLBs, and Other Pragmatics Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Winter 2006 Module 12 Page Table Management, TLBs, and Other Pragmatics Ed Lazowska Allen Center.
CSE 451: Operating Systems Winter 2009 Module 11a Page Table Management, TLBs, and Other Pragmatics Mark Zbikowski Gary Kimura 1.
Lei Zhao, Youtao Zhang, Jun Yang
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
Border Control: Sandboxing Accelerators
CS703 - Advanced Operating Systems
SecDir: A Secure Directory to Defeat Directory Side-Channel Attacks
Presentation transcript:

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks Nael Abu-Ghazaleh, University of California, Riverside Mehmet Kayaalp, IBM Research Khaled N. Khasawneh, University of California, Riverside Hodjat Asghari Esfeden, University of California, Riverside Jesse Elwell, Vencore Labs Dmitry Ponomarev, Binghamton University Aamer Jaleel, NVIDIA

Set-associative cache Cache Side Channel 28 1e 4c 24 09 bf 15 82 30 6f 53 d9 a4 49 2d 0e f2 85 5c 06 6a 91 4e 0c c4 fc da a8 d5 37 e9 9c SubBytes S-Box Set-associative cache sets June 21. 2017 ways

Flush+Reload Attack 1- Flush each line in the critical data Victim Core 1 Core 2 1- Flush each line in the critical data Victim Attacker 2- Victim accesses critical data 3- Reload critical data (measure time) L1-I L1-D L1-I L1-D L2 L2 Shared L3 Cache Evicted Time sets June 21. 2017 ways

Prime+Probe: L1 Attack L2 L1-I L1-D 1- Prime each cache set 2-way SMT core 1- Prime each cache set 2- Victim accesses critical data Victim Attacker 3- Probe each cache set (measure time) L1-I L1-D L2 L1 Cache Evicted Time sets June 21. 2017 ways

Prime+Probe: LLC Attack CPU1 CPU2 1- Prime each cache set Victim Attacker 2- Victim accesses critical data 3- Probe each cache set (measure time) L1-I L1-D L1-I L1-D Back-invalidations L2 L2 Evict critical data Shared L3 Inclusive Next access of the victim brings in critical data from memory To L1-D, L2, and L3; evicting attacker’s data from L3 Attacker detects accesses by looking at L3 state Back-invalidation from inclusiveness makes critical accesses visible to attacker June 21. 2017

Operation of Inclusive Caches Invalidated in L1 Victim Attacker L1 miss! L1 L1 Visible access to LLC LLC Back-Invalidation June 21. 2017

Relaxed Inclusion Caches Stays in L1 Victim Attacker L1 hit! L1 L1 No visible access to LLC LLC Read only June 21. 2017

Cache Inclusiveness Inclusive: Each cache line in local cache exists also in shared cache If not in shared cache, it cannot be in ANY local caches Provides snoop filtering: no unnecessary cache traffic Non-inclusive: Save cache space by not duplicating data For a cache miss, need to snoop all other local Extra snoop filtering hardware is required to eliminate unnecessary cache traffic Inclusive Non-inclusive Shared cache hit Copy Shared cache evict Evict from all local caches Do nothing Shared cache miss Go to memory Snoop local caches Data duplication All local data Some local data June 21. 2017

Relaxed Inclusion Caches Snoop filtering benefit is not relevant in some cases If the data cannot be in any other local cache (private) If the data cannot be in a modified state in any other local cache (read-only) If the data is read-only, there is no problem Even if another cache has a copy, we can still ignore it If the data is thread-private, and the thread is pinned to a core If we schedule the thread somewhere else, we need to write back the modified data from the local cache Inclusive Non-inclusive Relaxed Inclusion Caches Shared cache hit Copy Shared cache evict Evict from all local caches Do nothing Do nothing if read-only or thread-private Shared cache miss Go to memory Snoop local caches Data duplication All local data Some local data Only shared writable local data June 21. 2017

RIC Implementation System software can manage relaxed-inclusion bit on a page basis Existing page table entry permissions extended to mark RIC data Read-only or thread private A single bit added per cache line The relaxed-inclusion bit is copied from TLB on a cache fill Minimal hardware overhead June 21. 2017

Security Analysis In RIC, the attacker cannot evict victim’s data But the victim can still evict its own data If the critical data fits in the local cache, side channel is eliminated Critical accesses for AES with different local cache sizes June 21. 2017

Performance Analysis RIC eliminates data duplication for all read-only and thread-private data, increasing effective cache size e.g. all instructions can be evicted from LLC Parameters 4 cores 32KB 4-way L1D, L1I 256KB 8-way L2 4 MB 16-way shared L3 June 21. 2017

Reduction in Back-invalidates This figure shows that the percentage of back invalidates eliminated by RIC is fairly constant across the benchmarks (more elimination in 2MB LLC > we have more replacement in 2MB LLC, so we have more elimination by RIC in this case). June 21. 2017

RIC Results Summary June 21. 2017

Conclusion Inclusive LLCs allow attackers to monitor victim’s critical accesses But efficient because they enable snoop filtering RIC relaxes this property to eliminate the side channel While retaining snoop filtering RIC is a simple mechanism that improves performance compared to inclusive caches June 21. 2017