Lei Zhao, Youtao Zhang, Jun Yang

Slides:



Advertisements
Similar presentations
Bypass and Insertion Algorithms for Exclusive Last-level Caches
Advertisements

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
International Symposium on Microarchitecture Fine-grained Power Budgeting to Improve Write Throughput of MLC PCM 1 Lei Jiang, 2 Youtao Zhang, 2 Bruce R.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Defining Anomalous Behavior for Phase Change Memory
A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
The Evicted-Address Filter
Lecture 20 Last lecture: Today’s lecture: Types of memory
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Continuous Flow Multithreading on FPGA Gilad Tsoran & Benny Fellman Supervised by Dr. Shahar Kvatinsky Bsc. Winter 2014 Final Presentation March 1 st,
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Cache Replacement Championship
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,
Covert Channels Through Branch Predictors: a Feasibility Study
Cache Replacement Policy Based on Expected Hit Count
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Secure Dynamic Memory Scheduling against Timing Channel Attacks
Improving Cache Performance using Victim Tag Stores
Presented by: Nick Kirchem Feb 13, 2004
Two Dimensional Highly Associative Level-Two Cache Design
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
COSC3330 Computer Architecture
Adaptive Cache Partitioning on a Composite Core
Zhichun Zhu Zhao Zhang ECE Department ECE Department
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Cache Memory Presentation I
RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
RANDOM FILL CACHE ARCHITECTURE
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Professor, No school name
Complexity effective memory access scheduling for many-core accelerator architectures Zhang Liang.
Milad Hashemi, Onur Mutlu, Yale N. Patt
Using Dead Blocks as a Virtual Victim Cache
CDA 5155 Caches.
Faustino J. Gomez, Doug Burger, and Risto Miikkulainen
Reducing DRAM Latency via
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah
Cross-Core Prime+Probe Attacks on Non-inclusive Caches
Lecture 21: Memory Hierarchy
Cache - Optimization.
CMP Design Choices Finding Parameters that Impact CMP Performance
RFAcc: A 3D ReRAM Associative Array based Random Forest Accelerator
Presentation transcript:

Lei Zhao, Youtao Zhang, Jun Yang Mitigating Shift-Based Covert-Channel Attacks in Racetrack Last Level Caches Lei Zhao, Youtao Zhang, Jun Yang Department of Computer Science University of Pittsburgh

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Racetrack Memory Multiple bits stored on the track Adjacent bits share the same Read/Write Port Read/Write Port Shift Port BL RL WL SWL … SWL SL Shift Port 8/28/20198/28/2019

Head Management Policy Lazy policy Leave the head at where it is after each access Better performance Vulnerable to shift covert channels Eager policy Move the head back to a fixed position after each access Poor performance No shift covert channels 8/28/20198/28/2019

Outline Racetrack Memory Timing Attacks Mitigations Experiment Setup Evaluation

Timing Attacks Side Channel Attack Covert Channel Attack Victim leaks information unintendedly to attack through timing channels Covert Channel Attack Malicious threads transfer information that is not allowed through timing channels 8/28/20198/28/2019

Miss Based Attack Main Memory The cache is filled with receiver’s data Set 1 Set 2 Set n Way 1 Sender flush the cache with its own data sender Way 2 receiver Way m Receiver probe the cache to see whether its data is still there Tmem Main Memory   8/28/20198/28/2019

Shift Based Attack The heads are at random positions sender receiver The heads are at random positions Set 1 Set 2 Set n Way 1 Sender moves the heads to its data Way 2 Receiver probe its data to check shift latency Way m   8/28/20198/28/2019

Shift Based Attack 1 sender receiver sender receiver sender receiver sender receiver 100011011010010101 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Naïve Method Eager Head Management Policy Pros Cons Move head back to a fixed position after each access Pros eliminate shift covert channel Simple implementation Cons Cannot exploit data locality, poor performance 8/28/20198/28/2019

Security Level-Aware Approach L: security level (00: lowest, 11: highest) R: recency information (000: least recently used) Reset to the most recently used cache line of the lowest security thread Tag L R Data 00 001 sender receiver others 10 110 10 011 > > Security Level 01 000 00 010 01 111 01 100 00 011 8/28/20198/28/2019

Epoch-based Approach Within each epoch, reset head to the hottest position in previous epoch Change the default position only at the beginning of an epoch Epoch 1 Default Position Epoch 2 Default Position 8/28/20198/28/2019

Epoch-based Approach   Interval Bit Rate 50M 39.3bps 100M 19.9bps 200M 9.9bps At 200M interval, shift covert channel achieves the same bit rate with miss based covert channel (9.9bps) 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Experiment Setup We model a four core CMP with Gem5 Choose both memory intensive and non-intensive benchmarks from SPEC 2006 We evaluate four schemes: Baseline: Leave the head at where it is, no cover channel protection Eager: always reset head to a fixed position SL: security level aware protection Epoch: change default head position only at beginning of epoches 8/28/20198/28/2019

Experiment Setup Simulator Configuration Parameter Value Processor Alpha ISA, 4 cores, 8-way OoO core L1 Cache 4-way, 32 KB, 2 cycles L2 Cache 16-way, 32 MB, R/W/S: 24/24/4 cycles Memory DDR3 800MHz, tRAS=35ns, tRCD=13ns, tRP=13ns, tCL=13ns, tWR=15ns 8/28/20198/28/2019

Outline Timing Attacks Racetrack Memory Mitigations Experiment Setup Evaluation

Performance Both SL and Epoch outperform Eager On average Epoch even outperforms Baseline 8/28/20198/28/2019

Individual Thread IPC for Epoch The lower security level thread has better speedup 8/28/20198/28/2019

Conclusion We are the first to elaborate the existence of a new LLC covert channel in RM Our security level aware scheme can eliminate this covert channel with a better performance than the naïve approach Our epoch scheme reduces the newly discovered covert channel’s information leakage rate by up to 260 times with modest performance overhead 8/28/20198/28/2019