Introduction to CUDA Programming

Slides:



Advertisements
Similar presentations
High Performing Cache Hierarchies for Server Workloads
Advertisements

Computer Organization and Architecture
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
CS 104 Introduction to Computer Science and Graphics Problems
OS Spring’03 Introduction Operating Systems Spring 2003.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Lecture 19: Virtual Memory
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
Exploiting Cache-Timing in AES: Attacks and Countermeasures Ivo Pooters March 17, 2008 Seminar Information Security Technology.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
Thwarting cache-based side- channel attacks Yuval Yarom The University of Adelaide and Data61.
CS161 – Design and Architecture of Computer
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
18-447: Computer Architecture Lecture 23: Caches
Section 9: Virtual Memory (VM)
Multiscalar Processors
Today How was the midterm review? Lab4 due today.
New Cache Designs for Thwarting Cache-based Side Channel Attacks
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
How will execution time grow with SIZE?
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
18742 Parallel Computer Architecture Caching in Multi-core Systems
Mengjia Yan, Yasser Shalabi, Josep Torrellas
Cache Memory Presentation I
Bruhadeshwar Meltdown Bruhadeshwar
CSE 153 Design of Operating Systems Winter 2018
Lecture 23: Cache, Memory, Security
Lecture 3: Symmetric Key Encryption
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Energy-Efficient Address Translation
Chapter 8: Main Memory.
Lecture 28: Virtual Memory-Address Translation
Implementation of IDEA on a Reconfigurable Computer
Lecture 12: Cache Innovations
Cache Memories September 30, 2008
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Introduction to CUDA Programming
Meltdown CSE 351 Winter 2018 Instructor: Mark Wyse
Lecture: Cache Innovations, Virtual Memory
Introduction to the Intel x86’s support for “virtual” memory
Adapted from slides by Sally McKee Cornell University
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 20: OOO, Memory Hierarchy
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS 3410, Spring 2014 Computer Science Cornell University
15-740/ Computer Architecture Lecture 14: Prefetching
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture: Cache Hierarchies
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Padding Oracle Attacks
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Virtual Memory: Working Sets
CS703 - Advanced Operating Systems
Main Memory Background
CSE 153 Design of Operating Systems Winter 2019
Fundamentals of Computing: Computer Architecture
Chapter 11 Processor Structure and function
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Landon Cox January 17, 2018 January 22, 2018
MicroScope: Enabling Microarchitectural Replay Attacks
Meltdown & Spectre Attacks
Presentation transcript:

Introduction to CUDA Programming Advanced Computer Architecture Andreas Moshovos Winter 2019 Introduction Introduction to CUDA Programming

Should you take the course? Goals for Today Should you take the course? What should you know What is expected of you What you will get out of it

Should you take the course What should you know? I’ll let you decide on your own To help you: Overview modern processor design Through reviewing modern security attacks And to make it worthwhile for everyone Through reviewing a recent excellent work on mitigating them

Intel Sandy Bridge overview Material for today Intel Sandy Bridge overview Older design but Intel is using the same microarchitecture with “minor” tweaks as far as our discussion is concerned Meltdown and Spectre attacks Got to really understand micro-architecture to “get it” Moin Qureshi’s: CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and Remapping. MICRO 2018: 775-787, best paper award

Intel’s Sandy Bridge Architecture

Chip Architecture 32nm

Overall Core Architecture

Core Architecture – Instruction Fetch

Instruction Decode

Out-of-Order Execution

Execution Units – Scalar, SIMD, FP and AVX (vector)

Memory Access Units L1 load to use latency is 4 cycles Banked to support multiple accesses Poor man’s multiporting? L2 load to use latency is 12 cycles

L3 and ring interconnect CPUs, GPU and system agent Communicate via a ring Each CPU has an 2MB 8-way SA L3 slice Static hash of addresses to slices Latency varies: which core to which slice 26-31 cycles Max bandwidth: 435.2GB/s at 3.4Ghz

Overall Core Architecture

Meltdown

User vs. kernel space isolation Kernel space mapped onto user space Privilege bit Kernel space mapped onto user space

Exploit speculative execution Overview of attack Exploit speculative execution To temporarily load protected data into a register Use value to cause micro-architectural state change which persists

Meltdown Attack

Timing Scenarios

How to check whether an address is cached?

Works for shared addresses Flush+Reload Clflush instruction Flush any cache line that contains a specific address Works for shared addresses Think code that is shared among two processes Paper shows how to use that to read the secret key by detecting the order in which code functions are called

Meltdown skeleton

E.g., divide by zero or access illegal address Exception E.g., divide by zero or access illegal address Suppression vs. Handling Handling: fork prior to exception Suppression: Use Transactional Memory handling Branch prediction exploitation (Spectre)

Example Line 4: attempt to read the secret address [rcx] into register al this will raise a protection exception but the CPU will still do it as part of speculative execution the exception will be handled when the mov tries to RETIRE (Commit) SHL by 12 multiplies the AL values by 4K the page size The mov RBX will try to read a page at a distance based on the AL value. This is a race, may or may not happen.   Step 2: the attacker times accesses to all 256 pages.

Spectre

SPECTRE

CEASER

LLC contains micro-architectural state Goal LLC is shared LLC contains micro-architectural state Using side-channel attacks a process can read values from another process through this side-channel TO do so the attacker needs to know how addresses are mapped onto the LLC CEASER: per process mapping which changes frequently

CEASER: Mitigating Conflict-Based Attacks via Encrypted-Address and Remapping MICRO-2018 Moinuddin Qureshi

Background: Resource Sharing Modern systems share LLC for improving resource utilization B LLC CORE CORE Sharing the LLC allows system to dynamically allocate LLC capacity

Conflict-Based Cache Attacks Co-running spy can infer access pattern of victim by causing cache conflicts A V B B Miss for B Victim Accessed Set LLC TODO: More attacks CORE (Spy) CORE (Victim) Conflicts leak access pattern, used to infer secret [AES – Bernstein’05]

Table-Based Randomization Prior Solutions Way Partitioning Table-Based Randomization RPCache[ISCA’07], NewCache[MICRO’08] NoMo [TACO’12], CATalyst [HPCA’16] Mapping Table (MT) Inefficient use of cache space Not scalable to many core Mapping Table large for LLC (MBs) OS support needed to protect Table

Our Goal Protect the LLC from conflict-based attacks, while incurring Negligible storage overhead Negligible performance overhead No OS support No restriction on capacity sharing Localized Implementation

Outline Why? CEASE CEASER Effective?

CEASE: Cache using Encrypted Address Space Insight: Don’t memorize the random mapping, compute it Key Decrypt Key Encrypt xCAFE0000 Dirty Evict ELA Physical Line Address (PLA) 0xa17b20cf (ELA) LLC TAGS are also of ELA Say that none of the operations change Before writeback, motivate why decryption CEASE Localized change (ELA visible only within the cache) Cache operations (access, coherence, prefetch) all remain unchanged

Randomization via Encryption Lines that mapped to the same set, get scattered to different sets A’ B’ LLC Encrypt Key CEASE B’’ A’’ LLC Encrypt Key CEASE A B LLC Mapping depends on the key, different machines have different keys

Encryption: Need Fast, Small-Width Cipher Block Cipher B B PlainText CipherText PLA is ~40 bits (up-to 64TB memory) Small-width Ciphers deemed insecure: Brute-force attack on key Memorize all input-output pairs Larger tag (80+ bits) Latency of 10+ cycles Insight: ELA not visible to attacker (okay to use 40-bit block cipher)

Low-Latency Block Cipher (LLBC) Four-Stage Feistel-Network (with Substitution-Permutation Network) *inspired by DES and BlowFish Encryption LLBC incurs a delay of 24 XOR gates (approximately 2-cycle latency)

Outline Why? CEASE CEASER Effective?

Attacker can break CEASE within 22 seconds (8MB LLC) Let’s Break CEASE … [Liu et al. S&P, 2015] Form pattern such that cache has a conflict miss D B A C Remove one line from pattern & check conflict E Conflict Miss? No Yes LLC TODO: Make this animated Removed line MAPS to conflicting set Removed line NOT in conflicting set Attacker can break CEASE within 22 seconds (8MB LLC)

CEASER: CEASE with Remapping Split time into Epoch of N accesses (change Key every Epoch) Key Key Key BULK CACHE time EPOCH CurrKey NextKey CurrKey NextKey CurrKey NextKey GRADUAL CEASER uses gradual remapping to periodically change the keys

CEASER: CEASE with Remapping Remap-Rate of 1 %  Remap W-way set after (100*W) accesses A0 B0 X1 A1 CurrKey NextKey CurrKey NextKey SetPtr B1 X0 Access=0 Epoch=0 Access=800 Epoch=0 Access=200 Epoch=0 Access=600 Epoch=0 Access=400 Epoch=0 Access=0 Epoch=1 Y0 Y1 Z0 Z1 EMPHASIZE 1% Cache Access: If (Set[CurrKey] < Sptr) Use NextKey CEASER with gradual remap needs negligible hardware (one bit per line)

Outline Why? CEASE CEASER Effective?

Security Analysis Time to learn “Eviction Set” for one set (vulnerability removed after remap, <1ms) Remap-Rate 8MB LLC 1 MB LLC-Bank 1% (default) 100+ years 0.5% 21 years 0.1% 5 hours 0.05% 37 years 5 minutes No-Remap (CEASE) 22 seconds 0.4 seconds Limits impact on missrate, energy, accesses to ~1% EMPHASIZE 1% (POWER, ENERGY, PERFORAMNCE, MISSRATE) CEASER can tolerate years of attack (Even with remap-rate of 1%)

Performance and Storage Overheads 8 cores with 8MB LLC 16-way (34 workloads, SPEC + Graph) Rate-34 Mix-100 ALL-134 Norm Performance (%) Structures Cost 80-bit key (2 LLBC) 20 bytes SPtr 2 bytes Access Counter Total 24 bytes CEASER incurs negligible slowdown (~1% ) and storage overheads (24 bytes)

Summary Need practical solution to protect LLC from conflict-based attacks Robust to attacks (years) Negligible slowdown (~ 1%) No OS support needed Negligible storage (24 bytes) Localized change (within cache) Key2 Key1 Encrypt Line Address Cache Change Key, Periodically Appealing for Industrial Adoption

Course Details

On average we will meet once per week I will be traveling at times and I will be making up for the time “lost” by holding two lectures for some weeks What I expect you to do Reading assignments per week Questionnaires to be handed in at the beginning of class Homeworks Programming assignments Project Validate some prior work Do something new Maybe present papers (we will see)