Lei Zhao, Youtao Zhang, Jun Yang

Slides:

Advertisements

Similar presentations

Bypass and Insertion Algorithms for Exclusive Last-level Caches

Advertisements

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.

International Symposium on Microarchitecture Fine-grained Power Budgeting to Improve Write Throughput of MLC PCM 1 Lei Jiang, 2 Youtao Zhang, 2 Bruce R.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

Defining Anomalous Behavior for Phase Change Memory

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶

Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.

Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.

Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.

Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

The Evicted-Address Filter

Lecture 20 Last lecture: Today’s lecture: Types of memory

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Continuous Flow Multithreading on FPGA Gilad Tsoran & Benny Fellman Supervised by Dr. Shahar Kvatinsky Bsc. Winter 2014 Final Presentation March 1 st,

Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.

Cache Replacement Championship

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,

Covert Channels Through Branch Predictors: a Feasibility Study

Cache Replacement Policy Based on Expected Hit Count

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Secure Dynamic Memory Scheduling against Timing Channel Attacks

Improving Cache Performance using Victim Tag Stores

Presented by: Nick Kirchem Feb 13, 2004

Two Dimensional Highly Associative Level-Two Cache Design

Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh

COSC3330 Computer Architecture

Adaptive Cache Partitioning on a Composite Core

Zhichun Zhu Zhao Zhang ECE Department ECE Department

New Cache Designs for Thwarting Cache-based Side Channel Attacks

Mengjia Yan, Yasser Shalabi, Josep Torrellas

Cache Memory Presentation I

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

RANDOM FILL CACHE ARCHITECTURE

Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.

Professor, No school name

Complexity effective memory access scheduling for many-core accelerator architectures Zhang Liang.

Milad Hashemi, Onur Mutlu, Yale N. Patt

Using Dead Blocks as a Virtual Victim Cache

CDA 5155 Caches.

Faustino J. Gomez, Doug Burger, and Risto Miikkulainen

Reducing DRAM Latency via

Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,

CANDY: Enabling Coherent DRAM Caches for Multi-node Systems

Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah

Cross-Core Prime+Probe Attacks on Non-inclusive Caches

Lecture 21: Memory Hierarchy

Cache - Optimization.

CMP Design Choices Finding Parameters that Impact CMP Performance

RFAcc: A 3D ReRAM Associative Array based Random Forest Accelerator

Presentation transcript:

Lei Zhao, Youtao Zhang, Jun Yang Mitigating Shift-Based Covert-Channel Attacks in Racetrack Last Level Caches Lei Zhao, Youtao Zhang, Jun Yang Department of Computer Science University of Pittsburgh

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Racetrack Memory Multiple bits stored on the track Adjacent bits share the same Read/Write Port Read/Write Port Shift Port BL RL WL SWL … SWL SL Shift Port 8/28/20198/28/2019

Head Management Policy Lazy policy Leave the head at where it is after each access Better performance Vulnerable to shift covert channels Eager policy Move the head back to a fixed position after each access Poor performance No shift covert channels 8/28/20198/28/2019

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Timing Attacks Side Channel Attack Covert Channel Attack Victim leaks information unintendedly to attack through timing channels Covert Channel Attack Malicious threads transfer information that is not allowed through timing channels 8/28/20198/28/2019

Miss Based Attack Main Memory The cache is filled with receiver’s data Set 1 Set 2 Set n Way 1 Sender flush the cache with its own data sender Way 2 receiver Way m Receiver probe the cache to see whether its data is still there Tmem Main Memory 8/28/20198/28/2019

Shift Based Attack The heads are at random positions sender receiver The heads are at random positions Set 1 Set 2 Set n Way 1 Sender moves the heads to its data Way 2 Receiver probe its data to check shift latency Way m 8/28/20198/28/2019

Shift Based Attack 1 sender receiver sender receiver sender receiver sender receiver 100011011010010101 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Naïve Method Eager Head Management Policy Pros Cons Move head back to a fixed position after each access Pros eliminate shift covert channel Simple implementation Cons Cannot exploit data locality, poor performance 8/28/20198/28/2019

Security Level-Aware Approach L: security level (00: lowest, 11: highest) R: recency information (000: least recently used) Reset to the most recently used cache line of the lowest security thread Tag L R Data 00 001 sender receiver others 10 110 10 011 > > Security Level 01 000 00 010 01 111 01 100 00 011 8/28/20198/28/2019

Epoch-based Approach Within each epoch, reset head to the hottest position in previous epoch Change the default position only at the beginning of an epoch Epoch 1 Default Position Epoch 2 Default Position 8/28/20198/28/2019

Epoch-based Approach Interval Bit Rate 50M 39.3bps 100M 19.9bps 200M 9.9bps At 200M interval, shift covert channel achieves the same bit rate with miss based covert channel (9.9bps) 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Experiment Setup We model a four core CMP with Gem5 Choose both memory intensive and non-intensive benchmarks from SPEC 2006 We evaluate four schemes: Baseline: Leave the head at where it is, no cover channel protection Eager: always reset head to a fixed position SL: security level aware protection Epoch: change default head position only at beginning of epoches 8/28/20198/28/2019

Experiment Setup Simulator Configuration Parameter Value Processor Alpha ISA, 4 cores, 8-way OoO core L1 Cache 4-way, 32 KB, 2 cycles L2 Cache 16-way, 32 MB, R/W/S: 24/24/4 cycles Memory DDR3 800MHz, tRAS=35ns, tRCD=13ns, tRP=13ns, tCL=13ns, tWR=15ns 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Performance Both SL and Epoch outperform Eager On average Epoch even outperforms Baseline 8/28/20198/28/2019

Individual Thread IPC for Epoch The lower security level thread has better speedup 8/28/20198/28/2019

Conclusion We are the first to elaborate the existence of a new LLC covert channel in RM Our security level aware scheme can eliminate this covert channel with a better performance than the naïve approach Our epoch scheme reduces the newly discovered covert channel’s information leakage rate by up to 260 times with modest performance overhead 8/28/20198/28/2019