Leveraging Hardware Transactional Memory for Cache Side-Channel Defenses Professor, No school name Sanchuan Chen1, Fangfei Liu2, Zeyu Mi3, Yinqian Zhang1, Ruby B. Lee4, Haibo Chen3, XiaoFeng Wang5 1 The Ohio State University 2 Intel Coperation 3 Shanghai Jiao Tong University 4 Princeton University 5 Indiana University at Bloomington
Overview of Cache Side Channels Secret-dependent memory access Access Access Access Infer secret Access Access Access Access
Cache Side Channel Attacks Prime-Probe attack Flush-Reload attack Evict-Time attack Cache-Collision attack representative
Cache Side Channel Attacks Prime-Probe attack Prime: Fills the cache with its own data. Idle: Waits for a time interval while the victim executing. Probe: Measures the time to access the same cache sets. Prime Victim Executing Probe
Cache Side Channel Attacks Flush-Reload attack Flush: Flushes the cache lines containing addr. Idle: Waits for a time interval. Reload: Measures the time to reload addr. Flush the Cache Lines Victim Executing Measure time to reload
Cache Side Channel Attacks Evict-Time attack Evict: Fills specific cache sets to evict all other cache lines. Time: Triggers the victim to perform the security critical operation and measures the victim’s total execution time. Evict: Fills certain cache sets Trigger victim execution, time
Cache Side Channel Attacks Cache-Collision attack Evict: Cleans the whole cache. Time: Measures victim’s execution time to identify collision. Clean the whole Cache Victim Executing Measure, collision?
Method Overview Insights Method Many cache side channel attacks involve adversary evicting victim’s cache lines during the execution of sensitive operations. Method Hardware Transactional Memory provides a way for victim application to detect and get control to protect itself proactively when its data is evicted out of cache. cache line manipulation
Security Goals S1: Cache lines loaded in the security-critical regions cannot be evicted or invalidated during the execution of the security-critical regions. If so it happens, the code must be able to detect such occurrences. S2: The execution time of the security-critical region is independent of the cache hits and misses. S3: The cache footprints after the execution of the security-critical region are independent of its sensitive code or data.
Security Goals We prove by satisfying the security goals S1-S3, we can prevent all types of cache side channels we consider: 1. Asynchronous attacks (Prime-Probe / Flush-Reload) needs to evict victim’s cache lines which we can detect from S1. 2. Synchronous attacks (Evict-Time / Cache-Collision). S2 guarantees the attacker cannot extract any information from execution time. 3. Synchronous attacks (Prime-Probe / Flush-Reload). S3 guarantees the attacker cannot obtain information by observing the cache footprints.
Performance Goals P1: Performance overhead for the protected program is low without attacks.
Intel TSX Most widely deployed HTM. Implemented by retrofitting cache coherence protocols. Cache interference terminates hardware transactions. Defense can leverage this feature.
Method Overview Insights Method Many cache side channel attacks involve adversary evicting victim’s cache lines during the execution of sensitive operations. Method Hardware Transactional Memory provides a way for victim application to detect and get control to protect itself proactively when its data is evicted out of cache. cache line manipulation
Detecting Cache Collision via Intel TSX How are read set and write set tracked? 1000 2000 3000 4000 5000 0.2 0.4 0.6 0.8 1.0 1.2 Buffer Size(KB) Abort Rate (a)Tracking the read set 1.2 32KB = L1 cache size 1.0 We find that the size of the write set is slightly less than the size of the L1 data cache (32 KB), while the size of the read set is similar to the size of the LLC (3 MB). This means that the write set of Intel TSX is tracked in the L1 data cache and the read set is tracked in the LLC. 0.8 Abort Rate 3MB = LLC cache size 0.6 0.4 0.2 5 10 15 20 25 30 35 Buffer Size(KB) (b)Tracking the write set
Detecting Cache Collision via Intel TSX Can TSX detect L1 data cache evictions? 5 6 7 8 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (c) Detecting L1 data cache eviction Conflict L1 Read Abort Rate Conflict L1 Write Abort Rate We find that the transaction will not abort if the data read by a transaction is evicted out of the L1 data cache, but it will cause the transaction to abort with very high probability if the data written by a transaction is evicted out of the L1 data cache.
Detecting Cache Collision via Intel TSX Can TSX detect data eviction in the LLC? 10 11 12 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (d) Detecting data cache eviction in LLC Conflict L1 Read Abort Rate Conflict L1 Write Abort Rate We find that the eviction of data accessed by a transaction out of the LLC, no matter read or write, always results in transaction aborts.
Detecting Cache Collision via Intel TSX Can TSX detect eviction of instructions? 9 10 11 12 0.2 0.4 0.6 0.8 1.0 1.2 Number of Cache Lines Abort Rate (e) Detecting eviction of instruction in LLC We find that when instructions are evicted out of the LLC, the transaction will also abort.
Detecting Cache Collision via Intel TSX Will transactions abort upon context switches? 5000 50000 100000 500000 0.2 0.4 0.6 0.8 1.0 1.2 Preemption Period (cycles) Abort Rate (f) Detecting context switch We find that although, as shown in the previous experiments, the eviction of data read by a transaction out of the L1 data cache does not abort the transaction, high-frequency preemption would yield high abort rate when P2 is on the same core as P1. It suggests that a transaction will abort upon context switches.
Observations O1: Eviction of cache lines written in a transaction out of L1 data cache will terminate the transaction, while eviction of cache lines read in the transaction will not. O2: Eviction of data read/write and instructions executed in a transaction out of LLC will abort the transaction. O3: Transactions will abort upon context switches.
A Straw Man Design Each security-critical region in its entirety is enclosed into one transaction by inserting the _xbegin() and _xend() compiler intrinsics before and after the code region. S1 May have self-eviction and transaction abort by interrupts S2 Security-critical region execution time may be dependent of cache hits and misses S3 Cache footprints may be dependent of sensitive code or data
Cache footprints is independent of sensitive code or data A Prudent Design S1 Cache lines loaded in security-critical region cannot be evicted or invalidated. S2 Security-critical region execution time is independent of cache hits and misses S3 Cache footprints is independent of sensitive code or data
Evaluation: Micro Benchmarks aes-128-cbc 0.2 0.4 0.6 0.8 1.0 OpenSSL AES Ciphers Per Byte Processing Time (s) (a) Per byte processing time of OpenSSL AES decryption aes-192-cbc aes-256-cbc aes-128-ige aes-192-ige aes-256-ige 0.1 0.3 0.5 0.7 0.9 1e-8 Without Protection With Protection 34.1%-42.7% mark
Evaluation: Micro Benchmarks 0.005 0.015 0.020 OpenSSL ECDSA Ciphers Processing Time (s) (b) Processing time of OpenSSL ECDSA signing 0.010 ecdsak163 ecdsa283 ecdsak571 ecdsab163 ecdsab283 ecdsab571 Without Protection With Protection < 0.883%
Evaluation: Macro Benchmarks 10 1e3 Request(AES) Files Size Processing Time (ms) (c) HTTPS latency with varying sizes of the requested files 1e2 64B 1KB 64KB 1MB 64MB 1e4 Without Protection With Protection < 7.1%
Evaluation: Macro Benchmarks 50 250 Concurrent Connections Processing Time (ms) (d) HTTPS latency with varying concurrent connections 150 2 4 8 16 32 10 200 100 300 64 Without Protection With Protection < 1.85%
Evaluation: Macro Benchmarks 10 30 Request(AES) Cipher Suite Processing Time (ms) (e) HTTPS latency with varying cipher suites 20 5 25 15 35 40 Without Protection With Protection Cipher 1 Cipher 2 Cipher 3 Cipher 4 Cipher 5 Cipher 6 < 3.6% list
Evaluation: Macro Benchmarks 2 6 Request(ECDSA) Cipher Suite Processing Time (ms) (f) SSL handshake latency with varying cipher suites 4 Cipher 1 8 10 Without Protection With Protection Cipher 2 Cipher 3 Cipher 4 Cipher 5 Cipher 6 < 5.1% list
Evaluation: Macro Benchmarks Request Rate (REQ/s) Throughput of Apache HTTPS server 200 205 210 215 220 225 230 235 240 Without Protection With Protection
Discussion Hyperthreading Security policy upon attack detection From earlier experiments, our solution cannot be used to defeat side-channel attacks that are initiated from another thread sharing the same core. Security policy upon attack detection An appropriate threshold must be selected to allow the program to reenter the transaction if it aborts for a number of times that is lower than the threshold.
Conclusion We design a defense against cache side channel attacks using hardware transactional memory. We provide a systematic analysis of the security requirements that a software-only solution must meet to defeat cache attacks, We propose a software design that leverages Intel TSX to satisfy these requirements.
Thanks! chen.4825@osu.edu