MICRO-2018 Gururaj Saileshwar1 Prashant Nair1 Prakash Ramrakhyani2

Slides:

Advertisements

Similar presentations

A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.

Advertisements

A Case for Refresh Pausing in DRAM Memory Systems

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

A Case for Tamper-Resistant and Tamper-Evident Computer Systems Yan Solihin Center for Efficient, Secure and Reliable Computing (CESR)

Accountability in Hosted Virtual Networks Eric Keller, Ruby B. Lee, Jennifer Rexford Princeton University VISA 2009.

1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.

Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 17, 2003 Topic: Virtual Memory.

Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.

DEUCE: WRITE-EFFICIENT ENCRYPTION FOR PCM March 16 th 2015 ASPLOS-XX Istanbul, Turkey Vinson Young Prashant Nair Moinuddin Qureshi.

MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.

Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.

Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

Defining Anomalous Behavior for Phase Change Memory

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.

© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and.

Reducing Refresh Power in Mobile Devices with Morphable ECC

IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.

1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.

Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.

P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

Sampling Dead Block Prediction for Last-Level Caches

BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches

© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.

The Evicted-Address Filter

Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.

Database Laboratory Regular Seminar TaeHoon Kim Article.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Hardware-rooted Trust for Secure Key Management & Transient Trust

Improving Cache Performance using Victim Tag Stores

CRC-2, ISCA 2017 Toronto, Canada June 25, 2017

Dynamic Hashing (Chapter 12)

HOP: Hardware makes Obfuscation Practical Kartik Nayak

Rachata Ausavarungnirun, Kevin Chang

Sindhusha Doddapaneni

Mengjia Yan, Yasser Shalabi, Josep Torrellas

Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

Scalable High Performance Main Memory System Using PCM Technology

Speedup over Ji et al.'s work

Energy-Efficient Address Translation

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Milad Hashemi, Onur Mutlu, Yale N. Patt

Using Dead Blocks as a Virtual Victim Cache

Introduction to Database Systems

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

Lecture 6: Reliability, PCM

Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li

Lecture 14: Large Cache Design II

CACHE-CONSCIOUS INDEXES

MICRO-50 Swamit Tannu Zachary Myers Douglas Carmean Prashant Nair

CANDY: Enabling Coherent DRAM Caches for Multi-node Systems

SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories

Enabling Transparent Memory-Compression for Commodity Memory Systems

Dynamic Verification of Sequential Consistency

Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A

Janus Optimizing Memory and Storage Support for Non-Volatile Memory Systems Sihang Liu Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira.

Restrictive Compression Techniques to Increase Level 1 Cache Capacity

2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.

Presentation transcript:

Morphable Counters: Enabling Compact Integrity Trees for Low-Overhead Secure Memories MICRO-2018 Gururaj Saileshwar1 Prashant Nair1 Prakash Ramrakhyani2 Wendy Elsasser2 Jose Joao2 Moinuddin Qureshi1 1 2

Securing Main-Memory against Physical Attacks Unauthorized Reads Unauthorized Writes Replay Attack Data Verification Processor Memory ac348bf0 deadbeef 4af28c09 Hash Data0, Hash0 Data1, Hash1 Tamper Can force re-use of keys, repeat sensitive transactions etc. Data Encryption Cryptographic Signatures Replay attack protection - focus of this work

Replay Attack Protection with Integrity-Trees Secure Root Stored On-Chip 4 Verifying Every Level On Each Data Access Prevents Replay Integrity Tree Ctr Hash 3 Ctr Ctr Hash 2 Ctr Ctr Hash 1 Hashes Ctr Ctr Ctr Hash Causes Extra Memory Accesses & Slowdown Encrypted Data Counters Hash

This Talk: Designing Compact Integrity Trees Integrity-Tree Counters C1 C2 C3 … Integrity Tree C1 C2 C3 C4 C5 C6 … Shorter Height Encryption Counters C1 C2 C3 … C1 C2 C3 C4 C5 C6 … Base of the Integrity Tree Smaller Base Goal: Pack more Counters per Cacheline for Low-Overhead Integrity Trees

This Talk: Designing Compact Integrity Trees Benefits of Our Design 8.5x smaller vs VAULT1 4x smaller vs Baseline 13.5% vs VAULT1 6.3% vs Baseline Shorter Height Tree-Size Speedup Base of the Integrity Tree Smaller Base 1. VAULT - Taassori et al., ASPLOS, 2018. Goal: Pack more Counters per Cacheline for Low-Overhead Integrity Trees

Agenda Introduction Background and Motivation Design Results

Split-Counters - More Counters/Cacheline Naïve Counters 512 bit cache line C1 C2 C3 C4 C5 C7 C8 C6 8 counters x 64b

Split-Counters - More Counters/Cacheline Split Counters1 (Share Significant Bits) 512 bit cache line 64 x 7-bit minor counters 7-bit Minor Counters Can Overflow 128 34 23 . 17 Major Counter . Major Counter +1 C1 C2 C3 . C64 Major Counter Increment shared Major counter Changes values of ALL counters! Major | Minor = Counter 1. Re-encrypt 64 Data Lines (128 reads/writes) 2. Update 64 Hashes (128 reads/writes) 64-ary Design Trade-off: Packing more Counters vs Overflow Updates ! 1. Yan et al., ISCA, 2006

Impact of Packing More Counters Overflow 90% 10% 1 Performance Increases, then Decreases! State of the art Baseline Benefits (counter accesses), outweighed by costs (overflows) Goal: Pack more counters/cacheline, but fewer overflows ! 1. VAULT - Taassori et al., ASPLOS, 2018.

Agenda Introduction Background and Motivation Design Results

Analysis of Counter Overflows Counter Usage at the Time of Overflow. Few counters used in cacheline (tree-counters) All counters used in cacheline (encryption counters) Fraction of Overflows ~75% of overflows ~25% of overflows Fraction of Counter Cacheline Used Insight: Bimodal pattern in overflows  Morphable Counters with customized formats to reduce overflows

Few Counters Used: Compress Zero Counters 512-bit Counter Cacheline Major Counter Minor Counters (384-bit) Hash ZCC (<= 64 non-zero ctrs) Is-NZ? Bit Vector Non-Zero Counters Non-Zero Counters Counter Size <=16 ctrs 16-bits/ctr <=32 ctrs 8-bits/ctr Insight: When few counters non-zero, allocate bits only to them 128-bit 256-bit Uniform (>64 non-zero ctrs) 128 x 3-bit Counters ZCC provides large overflow-tolerant counters, when less than 25% counters are used out of 128

Avoiding Overflows When All Counters Used Instead of conventional (Major II Minor) Overflowing Minor Counter Major Counter Minor Counter (3-bit) Effective Value = (Major + Minor) +5 -5 Rebase Counters (Avoid Overflow) Reset Counters (Existing Design) Add smallest minor counter Subtract that value from all Counters changed – re-encryption needed No change; No re-encryption; Rebasing avoids counter overflow and overheads, when all counters used

Agenda Introduction Background and Motivation Design Results

Reduction in Overflows SC-64 (Baseline) SC-128 - 7.4x MorphCtr-128 (ZCC-only) - 1.4x MorphCtr-128 (ZCC+Rebasing) - 1.6x MorphCtr-128 packs 2x Counters / Cacheline, Still, 1.6x Fewer Overflows vs SC-64

Performance Benefits Counters Overflow 1% 13.5% 6.4% 12% MorphCtr-128 reduces counter accesses, without a bloat in overflow updates 6.4% speedup vs Baseline, 13.5% vs state-of-the-art Note: 4 Cores, 8MB LLC, 16GB Secure Memory, 128KB Dedicated Counter Cache

Storage Benefits 1.6% 0.050% 4 0.025% 3 0.8% 0.006% 2 1.6% 0.8% Configuration Encryption Counter Storage Integrity-Tree Levels Accessed (From Memory) VAULT 1.6% 0.050% 4 SC-64 0.025% 3 MorphCtr-128 0.8% 0.006% 2 Configuration Encryption Counter Storage VAULT 1.6% SC-64 MorphCtr-128 0.8% Encryption counter storage reduced by 2X, Integrity-tree size reduced by 4x vs Baseline, 8.5X vs VAULT

Conclusion Thank You! Questions? Morphable Counters  reducing overheads of secure memory 2x Compact & 1.6x Less Overflow-Rate vs Split Counters For Practically Free  Only Encoding Change, No Loss in Security Closed 1/3rd gap between State-of-the-Art & Non-Secure 13.5% Speedup with 8.5x Smaller Integrity-Tree than VAULT Thank You! Questions?