Secure Dynamic Memory Scheduling against Timing Channel Attacks

Slides:

Advertisements

Similar presentations

Rohit Kugaonkar CMSC 601 Spring 2011 May 9 th 2011

Advertisements

Lecture 4: Cloud Computing Security: a first look Xiaowei Yang (Duke University)

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

1 Hardware Support for Isolation Krste Asanovic U.C. Berkeley MURI “DHOSA” Site Visit April 28, 2011.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Homework #4 Solutions Brian A. LaMacchia Portions © , Brian A. LaMacchia. This material is provided without.

Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.

Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds Written by Thomas Ristenpart Eran Tromer Hovav Shacham Stehan.

Eliminating Fine Grained Timers in Xen Bhanu Vattikonda with Sambit Das and Hovav Shacham.

A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.

Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.

Moinuddin K.Qureshi, Univ of Texas at Austin MICRO’ , 12, 05 PAK, EUNJI.

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:

1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,

1 CSCI 2510 Computer Organization Memory System II Cache In Action.

1 The University of Texas at Austin Ali Shafiee, A. Gundu, M. Shevgoor, R. Balasubramonian and M. Tiwari.

Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.

Hey, You, Get Off of My Cloud Thomas Ristenpart, Eran Tromer, Hovav Shacham, Stefan Savage Presented by Daniel De Graaf.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

Mapping/Topology attacks on Virtual Machines

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Lecture 2: Performance Evaluation

Hey, You, Get Off of My Cloud

Reducing Memory Interference in Multicore Systems

CSE 502: Computer Architecture

ECE232: Hardware Organization and Design

Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics

A Staged Memory Resource Management Method for CMP Systems

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

ASR: Adaptive Selective Replication for CMP Caches

Written by : Thomas Ristenpart, Eran Tromer, Hovav Shacham,

Analyzing Security and Energy Tradeoffs in Autonomic Capacity Management Wei Wu.

Scheduling Jobs Across Geo-distributed Datacenters

5.2 Eleven Advanced Optimizations of Cache Performance

Cache Memory Presentation I

Bruhadeshwar Meltdown Bruhadeshwar

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

Lecture 13: Large Cache Design I

Energy-Efficient Address Translation

Application Slowdown Model

Professor, No school name

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #

CPU Scheduling G.Anuradha

Module 5: CPU Scheduling

Chapter 5: CPU Scheduling

Page Replacement.

Achieving High Performance and Fairness at Low Cost

3: CPU Scheduling Basic Concepts Scheduling Criteria

CARP: Compression-Aware Replacement Policies

Reducing DRAM Latency via

Homework #4 Solutions Brian A. LaMacchia

VDL Mode 4 Performance Simulator (DLS enhancements) presented by EUROCONTROL Montreal, 26 October 2004.

Samira Khan University of Virginia Nov 14, 2018

Module 5: CPU Scheduling

Cache - Optimization.

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

Lei Zhao, Youtao Zhang, Jun Yang

Module 5: CPU Scheduling

Presented by Florian Ettinger

VDL Mode 4 Performance Simulator (DLS enhancements) presented by EUROCONTROL Montreal, 26 October 2004.

Presentation transcript:

Secure Dynamic Memory Scheduling against Timing Channel Attacks Yao Wang, Benjamin Wu, G. Edward Suh Cornell University

Timing Channel Problem What is a timing channel? Why do we care? Attacks demonstrated in real world environment Example: cache timing channel attacks in Amazon EC21,2 Capabilities Steal cryptographic keys, predict users’ passwords, track users’ browser visit history, etc. Victim Attacker # Example code: if (secret) sleep(10s) else sleep(5s) affects Secret Timing infers [1] Thomas Ristenpart, Eran Tromer, Hovav Shacham and Stefan Savage, “Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds”, CCS09 [2] Yinqian Zhang, Ari Juels, Michael Reiter and Thomas Ristenpart, “Cross-VM Side Channels and Their Use to Extract Private Keys”, CCS12

Timing Channels in Main Memory Security Domain (SD) DRAM Schedule SD0 SD1 SD1 SD0 req req Memory Controller DRAM timing constraints Time DRAM Different ranks: Trank req req Different banks in the same rank: Tbank req req The same bank in the same rank: Tworst Banks req req Trank < Tbank < Tworst Rank 0 Rank 1 6 18 43

A Covert Channel Attack1 Sender Receiver Sender sends a sequence of bits by dynamically changing memory demand To send a ‘0’, sender does not issue any memory request To send a ‘1’, sender sends many memory requests Receiver keeps sending requests Measures its own throughput Throughput variations Core 1 Core 2 $L1 $L1 Memory 1 0 0 1 0 1 1 0 [1] Yao Wang, Andrew Ferraiuolo, and Edward Suh, “Timing Channel Protection for a Shared Memory Controller”, HPCA 2014.

Naïve Protection: Temporal Partitioning (TP)1 Static turn scheduling Add dead time at the end of each turn No new memory transactions can be issued during dead time Tdead >= Tworst (43 cycles) SD0 SD1 SDN Time Turn Tdead Time Turn Dead time introduces significant performance overhead! [1] Yao Wang, Andrew Ferraiuolo, and Edward Suh, “Timing Channel Protection for a Shared Memory Controller”, HPCA 2014.

Reducing Dead Time Overhead Spatial partitioning1,2 Map security domains to different banks or ranks Pro: Significantly reduce dead time Cons: Scalability issue Memory fragmentation Requires OS support Bank Triple Alternation (BTA)2 Enforce consecutive turns to access different banks Pro: Does not require spatial partitioning Con: Inefficient scheduling Tbank (or Trank) Bank 0,3,6 Bank 1,4,7 Bank 2,5 Tbank Time Turn Time Turn [1] Yao Wang, Andrew Ferraiuolo, and Edward Suh, “Timing Channel Protection for a Shared Memory Controller”, HPCA 2014. [2] Ali Shafiee, Akhila Gundu, Manjunath Shevgoor, Rajeev Balasubramonian and Mohit Tiwari, “Avoiding Information Leakage in the Memory Controller with Fixed Service Policies”, MICRO 2015

Secure Memory Scheduling: SecMC-NI Key idea: interleaving requests that access different ranks and banks to construct an efficient schedule Parameter Definition Trank : 6 cycles Tbank :18 cycles Tworst: 43 cycles Tturn : Turn length SD0 SD1 SDN Time Turn

Interleaving Requests that Access Different Banks Request selection algorithm Requests in a turn must access different banks At most Tturn/Tbank requests are scheduled in each turn Example: Tturn = 54 cycles Request reordering algorithm Reorder the requests so that requests that access the same bank are separated by at least Tturn cycles This is secure Effect of reordering

Can We Do the Same for Ranks? For each rank, construct the previous schedule separately At most Tbank/Trank ranks are selected in each turn Combine them by shifting each rank schedule by Trank cycles Example: Tturn = 54 cycles Rank 0 Rank 3 Rank 2 At most 9 requests can be issued in each turn !

Performance Evaluation core Simulator ZSim + DRAMSim2 Workloads Multi-program workloads using SPEC CPU2006 benchmarks Performance metric Weighted speedup 8 core 32kB $L1 $L1 $L2 8MB Memory 1 memory channel, 8 ranks, 8 banks in each rank

Comparison with BTA SecMC-NI outperforms BTA by 45% on average Weighted Speedup (Normalized to FR-FCFS) Queuing Delay We show one program with eight copies to study the impact of memory intensity Also study mixed workloads SecMC-NI outperforms BTA by 45% on average SecMC-NI cuts the average queuing delay of BTA by half

Comparison with Spatial Partitioning SecMC-NI achieves similar performance as BP (Bank Partitioning) on average But there is still a significant performance gap (35%) between SecMC-NI and RP (Rank Partitioning) We show one program with eight copies to study the impact of memory intensity Also study mixed workloads Can we do better?

Trade Security for Performance: SecMC-Bound Key idea: For performance: allow dynamic memory scheduling (like FR-FCFS) For security: enforce each request to return at pre-determined time Can we find the worst-case finish time for a request? TP with 2-core case, turn length 43 cycles Worst-case finish time SD0 Finish R0B0 Interference from other domains Time DRAM Finish Return Of course we don’t want to return at TP’s time Under TP, the i th request from security domain s can finish by s*43 + i*43*num_domains+ 43 SD0 req_0 SD1 req_0 SD0 req_1 SD1 req_1 Time 43

Return Earlier than Worst-Case Time SD0 req_0 SD1 req_0 SD2 req_0 SD1 req_0 What if a request finishes after its ER? EI EI EI ER b 43 b d 43 Time Return For each request, we assign Expected Issue time (EI) : used for arbitration Parameter b: interval between EIs The request with the smallest EI wins the arbitration Expected Response time (ER) : used for security Parameter d: delay between EI and ER As long as every request returns at ER, no information leaks d hides the interference between different security domains

Limit the Information Leakage of Violations (1) ER violation: a request finishes after its ER ER violations represent information leakage Limit the information that one violation conveys A violation can only have one of W possible delays Worst-case time derived from TP req EI ER ER + d1 ER + dW-1 ER + dworst ----- Meeting Notes (9/30/16 15:28) ----- ER VIOLATION get rid of "a kind" Time d Finish Return

Limit the Information Leakage of Violations (2) Set a limit on the number of violations per security domain M violations in every N requests (a period) e.g., M = 4, N = 1,000: at most 4 violations can happen in 1000 requests Once a security domain reaches the limit Always return at the worst-case time derived from TP Attacker cannot get further information Reset the violation counter after a period How many bits can an attacker derive in a period? This is conservative bound This is a conservative bound !

Performance under Different Limits b = 6, d = 160 As the limit gets lower Performance starts to decrease due to entering worst-case mode Leakage decreases exponentially Tradeoff between performance and security

Comparison with Previous Schemes SecMC-Bound outperforms SecMC-NI and BP Trade security for performance The leakage is bounded

Summary Shared memory controllers are prone to timing channel attacks Previous protections suffer from either inflexibility or high performance overhead SecMC-NI completely removes timing channels while achieving 45% performance improvement over the state of the art (BTA) SecMC-Bound further improves the performance by enabling a trade-off between security and performance with a quantitative security guarantee

Backup Slides

Square and Multiply Algorithm for RSA x = C for j = 1 to n x = mod(x2, N) if dj == 1 then x = mod(xC, N) end if next j return x Timing channel C: encrypted message x: decrypted message N: product of two large prime numbers d: RSA private key n: the number of bits in the key

Optimization: Dynamic Tuning of b and d Values Intuition: with fixed b and d values, less memory-intensive programs incur less ER violations Use smaller b and d values for less memory-intensive programs Benefit: smaller b and d values reduces memory latency Dynamic tuning based on observed number of violations Assume m is number of violations in previous period (N requests) if m <= thdec, d = d – constant if m >= thinc, d = d + constant else, d = d

Dynamic Tuning of d Value Initial d value = 160 thinc = 3 thdec = 0 Dynamic tuning of d value improves performance by reducing d to a value that just meets the security requirements e.g., 160  20 Designers do not need to come up with good d values

Performance Evaluation ZSim + DRAMSim2 ZSim: cores, caches DRAMSim2: DRAM 8-program workloads 24 SPEC CPU2006 benchmarks Each program fast-forwards for 1 billion instructions and then simulates for 100 million instructions

SecMC-NI: Effect of Address Randomization Address randomization benefits hmmer significantly Requests are distributed more evenly across banks and ranks

Parameter Sweep for b and d (No Limit) Smaller b and d result in better performance b = 3 b = 6 Smaller b and d result in more violations

Optimization: Avoiding Worst-Case Times Gradually increase the value of d with the number of violations Before Optimization After Optimization

Parameter Sweep for b and d (no limit)

Performance under Different Limits b = 6, d = 160 As the limit gets lower Performance starts to decrease due to entering worst-case mode Leakage decreases exponentially Tradeoff between performance and security

Dynamic Tuning of d Value Static Initial d value = 160

Comparison with Previous Schemes SecMC-Bound parameters b = 6, d = 160

Combining SecMC-Bound with Partitioning Combining SecMC-Bound with spatial partitioning outperforms applying spatial partitioning alone