Professor, No school name

Slides:



Advertisements
Similar presentations
High Performing Cache Hierarchies for Server Workloads
Advertisements

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Building web applications on top of encrypted data using Mylar Presented by Tenglu Liang Tai Liu.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Architectural Impact of SSL Processing Jingnan Yao.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Guanhai Wang, Minglu Li and Chuliang Weng Shanghai Jiao Tong University, China. SVM09, Wuhan, China.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Adaptive Transaction Scheduling for Transactional Memory Systems Richard M. Yoo Hsien-Hsin S. Lee Georgia Tech.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Hardware-Software Integrated Approaches to Defend Against Software Cache-based Side Channel Attacks Jingfei Kong* University of Central Florida Onur Acıiçmez.
Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Re-evaluating the WPA2 Security Protocol
X. Zhang, Y. Xiao, Y. Zhang Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices Xiaokuan Zhang, Yuan Xiao, Yinqian.
Bus Interfacing Processor-Memory Bus Backplane Bus I/O Bus
Predictable Cache Coherence for Multi-Core Real-Time Systems
Reducing OLTP Instruction Misses with Thread Migration
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
UNIVERSITY OF HOUSTON Start
X. Zhang, Y. Xiao, Y. Zhang Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices Xiaokuan Zhang, Yuan Xiao, Yinqian.
Lecture 12 Virtual Memory.
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
A Study on Snoop-Based Cache Coherence Protocols
Introduction to Operating System (OS)
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Cache Memory Presentation I
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas
Bruhadeshwar Meltdown Bruhadeshwar
Secure In-Cache Execution
Lecture 23: Cache, Memory, Security
RANDOM FILL CACHE ARCHITECTURE
143a discussion session week 3
What we need to be able to count to tune programs
Transactional Memory Coherence and Consistency
Guoxing Chen1* & Wenhao Wang2,3*, Tianyu Chen2, Sanchuan Chen1,
Lecture 14: Reducing Cache Misses
Hardware Multithreading
User-mode Secret Protection (SP) architecture
Architectures of distributed systems Fundamental Models
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Architectures of distributed systems Fundamental Models
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2004 Page Tables, TLBs, and Other Pragmatics Hank Levy 1.
Cross-Core Prime+Probe Attacks on Non-inclusive Caches
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Hardware Multithreading
Lecture 8: Efficient Address Translation
Architectures of distributed systems Fundamental Models
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
Lecture 21: Synchronization & Consistency
CSE 542: Operating Systems
CSE 542: Operating Systems
Lei Zhao, Youtao Zhang, Jun Yang
University of Illinois at Urbana-Champaign
MicroScope: Enabling Microarchitectural Replay Attacks
Meltdown & Spectre Attacks
Presentation transcript:

Leveraging Hardware Transactional Memory for Cache Side-Channel Defenses Professor, No school name Sanchuan Chen1, Fangfei Liu2, Zeyu Mi3, Yinqian Zhang1, Ruby B. Lee4, Haibo Chen3, XiaoFeng Wang5 1 The Ohio State University 2 Intel Coperation 3 Shanghai Jiao Tong University 4 Princeton University 5 Indiana University at Bloomington

Overview of Cache Side Channels Secret-dependent memory access Access Access Access Infer secret Access Access Access Access

Cache Side Channel Attacks Prime-Probe attack Flush-Reload attack Evict-Time attack Cache-Collision attack representative

Cache Side Channel Attacks Prime-Probe attack Prime: Fills the cache with its own data. Idle: Waits for a time interval while the victim executing. Probe: Measures the time to access the same cache sets. Prime Victim Executing Probe

Cache Side Channel Attacks Flush-Reload attack Flush: Flushes the cache lines containing addr. Idle: Waits for a time interval. Reload: Measures the time to reload addr. Flush the Cache Lines Victim Executing Measure time to reload

Cache Side Channel Attacks Evict-Time attack Evict: Fills specific cache sets to evict all other cache lines. Time: Triggers the victim to perform the security critical operation and measures the victim’s total execution time. Evict: Fills certain cache sets Trigger victim execution, time

Cache Side Channel Attacks Cache-Collision attack Evict: Cleans the whole cache. Time: Measures victim’s execution time to identify collision. Clean the whole Cache Victim Executing Measure, collision?

Method Overview Insights Method Many cache side channel attacks involve adversary evicting victim’s cache lines during the execution of sensitive operations. Method Hardware Transactional Memory provides a way for victim application to detect and get control to protect itself proactively when its data is evicted out of cache. cache line manipulation

Security Goals S1: Cache lines loaded in the security-critical regions cannot be evicted or invalidated during the execution of the security-critical regions. If so it happens, the code must be able to detect such occurrences. S2: The execution time of the security-critical region is independent of the cache hits and misses. S3: The cache footprints after the execution of the security-critical region are independent of its sensitive code or data.

Security Goals We prove by satisfying the security goals S1-S3, we can prevent all types of cache side channels we consider: 1. Asynchronous attacks (Prime-Probe / Flush-Reload) needs to evict victim’s cache lines which we can detect from S1. 2. Synchronous attacks (Evict-Time / Cache-Collision). S2 guarantees the attacker cannot extract any information from execution time. 3. Synchronous attacks (Prime-Probe / Flush-Reload). S3 guarantees the attacker cannot obtain information by observing the cache footprints.

Performance Goals P1: Performance overhead for the protected program is low without attacks.

Intel TSX Most widely deployed HTM. Implemented by retrofitting cache coherence protocols. Cache interference terminates hardware transactions. Defense can leverage this feature.

Method Overview Insights Method Many cache side channel attacks involve adversary evicting victim’s cache lines during the execution of sensitive operations. Method Hardware Transactional Memory provides a way for victim application to detect and get control to protect itself proactively when its data is evicted out of cache. cache line manipulation

Detecting Cache Collision via Intel TSX How are read set and write set tracked? 1000 2000 3000 4000 5000 0.2 0.4 0.6 0.8 1.0 1.2 Buffer Size(KB) Abort Rate (a)Tracking the read set 1.2 32KB = L1 cache size 1.0 We find that the size of the write set is slightly less than the size of the L1 data cache (32 KB), while the size of the read set is similar to the size of the LLC (3 MB). This means that the write set of Intel TSX is tracked in the L1 data cache and the read set is tracked in the LLC. 0.8 Abort Rate 3MB = LLC cache size 0.6 0.4 0.2 5 10 15 20 25 30 35 Buffer Size(KB) (b)Tracking the write set

Detecting Cache Collision via Intel TSX Can TSX detect L1 data cache evictions? 5 6 7 8 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (c) Detecting L1 data cache eviction Conflict L1 Read Abort Rate Conflict L1 Write Abort Rate We find that the transaction will not abort if the data read by a transaction is evicted out of the L1 data cache, but it will cause the transaction to abort with very high probability if the data written by a transaction is evicted out of the L1 data cache.

Detecting Cache Collision via Intel TSX Can TSX detect data eviction in the LLC? 10 11 12 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (d) Detecting data cache eviction in LLC Conflict L1 Read Abort Rate Conflict L1 Write Abort Rate We find that the eviction of data accessed by a transaction out of the LLC, no matter read or write, always results in transaction aborts.

Detecting Cache Collision via Intel TSX Can TSX detect eviction of instructions? 9 10 11 12 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (e) Detecting eviction of instruction in LLC We find that when instructions are evicted out of the LLC, the transaction will also abort.

Detecting Cache Collision via Intel TSX Will transactions abort upon context switches? 5000 50000 100000 500000 0.2 0.4 0.6 0.8 1.0 1.2 Preemption Period (cycles) Abort Rate (f) Detecting context switch We find that although, as shown in the previous experiments, the eviction of data read by a transaction out of the L1 data cache does not abort the transaction, high-frequency preemption would yield high abort rate when P2 is on the same core as P1. It suggests that a transaction will abort upon context switches.

Observations O1: Eviction of cache lines written in a transaction out of L1 data cache will terminate the transaction, while eviction of cache lines read in the transaction will not. O2: Eviction of data read/write and instructions executed in a transaction out of LLC will abort the transaction. O3: Transactions will abort upon context switches.

A Straw Man Design Each security-critical region in its entirety is enclosed into one transaction by inserting the _xbegin() and _xend() compiler intrinsics before and after the code region. S1 May have self-eviction and transaction abort by interrupts S2 Security-critical region execution time may be dependent of cache hits and misses S3 Cache footprints may be dependent of sensitive code or data

Cache footprints is independent of sensitive code or data A Prudent Design S1 Cache lines loaded in security-critical region cannot be evicted or invalidated. S2 Security-critical region execution time is independent of cache hits and misses S3 Cache footprints is independent of sensitive code or data

Evaluation: Micro Benchmarks aes-128-cbc 0.2 0.4 0.6 0.8 1.0 OpenSSL AES Ciphers Per Byte Processing Time (s) (a) Per byte processing time of OpenSSL AES decryption aes-192-cbc aes-256-cbc aes-128-ige aes-192-ige aes-256-ige 0.1 0.3 0.5 0.7 0.9 1e-8 Without Protection With Protection 34.1%-42.7% mark

Evaluation: Micro Benchmarks 0.005 0.015 0.020 OpenSSL ECDSA Ciphers Processing Time (s) (b) Processing time of OpenSSL ECDSA signing 0.010 ecdsak163 ecdsa283 ecdsak571 ecdsab163 ecdsab283 ecdsab571 Without Protection With Protection < 0.883%

Evaluation: Macro Benchmarks 10 1e3 Request(AES) Files Size Processing Time (ms) (c) HTTPS latency with varying sizes of the requested files 1e2 64B 1KB 64KB 1MB 64MB 1e4 Without Protection With Protection < 7.1%

Evaluation: Macro Benchmarks 50 250 Concurrent Connections Processing Time (ms) (d) HTTPS latency with varying concurrent connections 150 2 4 8 16 32 10 200 100 300 64 Without Protection With Protection < 1.85%

Evaluation: Macro Benchmarks 10 30 Request(AES) Cipher Suite Processing Time (ms) (e) HTTPS latency with varying cipher suites 20 5 25 15 35 40 Without Protection With Protection Cipher 1 Cipher 2 Cipher 3 Cipher 4 Cipher 5 Cipher 6 < 3.6% list

Evaluation: Macro Benchmarks 2 6 Request(ECDSA) Cipher Suite Processing Time (ms) (f) SSL handshake latency with varying cipher suites 4 Cipher 1 8 10 Without Protection With Protection Cipher 2 Cipher 3 Cipher 4 Cipher 5 Cipher 6 < 5.1% list

Evaluation: Macro Benchmarks Request Rate (REQ/s) Throughput of Apache HTTPS server 200 205 210 215 220 225 230 235 240 Without Protection With Protection

Discussion Hyperthreading Security policy upon attack detection From earlier experiments, our solution cannot be used to defeat side-channel attacks that are initiated from another thread sharing the same core. Security policy upon attack detection An appropriate threshold must be selected to allow the program to reenter the transaction if it aborts for a number of times that is lower than the threshold.

Conclusion We design a defense against cache side channel attacks using hardware transactional memory. We provide a systematic analysis of the security requirements that a software-only solution must meet to defeat cache attacks, We propose a software design that leverages Intel TSX to satisfy these requirements.

Thanks! chen.4825@osu.edu