Architecture Support for Secure Computing Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

1 Implementing an Untrusted Operating System on Trusted Hardware David Lie Chandramohan A. Thekkath Mark Horowitz University of Toronto, Microsoft Research,
1 Architectural Support for Copy and Tamper- Resistant Software David Lie Computer Systems Laboratory Stanford University.
Implementing an Untrusted Operating System on Trusted Hardware.
Architectures for Secure Processing Matt DeVuyst.
Accountability in Hosted Virtual Networks Eric Keller, Ruby B. Lee, Jennifer Rexford Princeton University VISA 2009.
May 7, A Real Problem  What if you wanted to run a program that needs more memory than you have?
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Multiprocessing Memory Management
Translation Buffers (TLB’s)
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Towards Application Security On Untrusted OS
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 9: Cryptography.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 346, Royden, Operating System Concepts Operating Systems Lecture 24 Paging.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Architecture for Protecting Critical Secrets in Microprocessors Ruby Lee Peter Kwan Patrick McGregor Jeffrey Dwoskin Zhenghong Wang Princeton Architecture.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
1 Architectural Support for Copy and Tamper Resistant Software David Lie, Chandu Thekkath, Mark Mitchell, Patrick Lincoln, Dan Boneh, John Mitchell and.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Hardware Assisted Control Flow Obfuscation for Embedded Processors Xiaoton Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande HIDE: An Infrastructure.
CS526: Information Security Prof. Sam Wagstaff September 16, 2003 Cryptography Basics.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
4 th lecture.  Message to be encrypted: HELLO  Key: XMCKL H E L L O message 7 (H) 4 (E) 11 (L) 11 (L) 14 (O) message + 23 (X) 12 (M) 2 (C) 10 (K) 11.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
Strong Security for Distributed File Systems Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.
CE Operating Systems Lecture 14 Memory management.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
Virtual Memory 1 1.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Processor Architecture
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Architecture Lecture 27 Fasih ur Rehman.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Creating Security using Software and Hardware Bradley Herrup CS297- Security and Programming Languages.
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
IT 221: Introduction to Information Security Principles Lecture 5: Message Authentications, Hash Functions and Hash/Mac Algorithms For Educational Purposes.
COSC 3330/6308 Second Review Session Fall Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.
Cryptographic Hash Function. A hash function H accepts a variable-length block of data as input and produces a fixed-size hash value h = H(M). The principal.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
CS161 – Design and Architecture of Computer
Multilevel Memories (Improving performance using alittle “cash”)
5.2 Eleven Advanced Optimizations of Cache Performance
Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.
Chapter 8: Main Memory.
Continuous, Low Overhead, Run-Time Validation of Program Executions
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Memory Management 11/17/2018 A. Berrached:CS4315:UHD.
AEGIS: Secure Processor for Certified Execution
User-mode Secret Protection (SP) architecture
Translation Buffers (TLB’s)
Translation Buffers (TLB’s)
Lecture 8: Efficient Address Translation
CSE 471 Autumn 1998 Virtual memory
Translation Buffers (TLBs)
Cache writes and examples
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory 1 1.
Presentation transcript:

Architecture Support for Secure Computing Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04

Presentation Outline Motivation Assumptions Attacks Proposed Solutions Pending questions and future research

Motivation Currently piracy of software and digital media is a huge problem Attempts to solve with software solutions have proven easy to foil Adding support at the hardware level is a promising solution

Assumptions All solutions assume processor and on chip storage to be secure Operating system and all peripherals, including off chip memory, are untrusted Processor OS Memory I/O Devices

Points of Attack Because of untrusted memory attacks can occur on any transfers to or from external memory Because of untrusted OS, attacks could occur at context switches, when OS takes control of operation

Memory Attacks Adversaries may try to gain information from unprotected off chip memory by:  Modifying data Spoofing, Splicing, and Replay Attacks  Monitoring data access pattern (address bus)

Solutions Basic XOM architecture XOM using One Time Pad Encryption Hash Trees Aegis Processor HIDE Architecture

XOM (Execute-Only Memory) Tamper Resistant Software  Software is encrypted using symmetric encryption, its key is encrypted using asymmetric encryption Asymmetric Encryption - public key used by vendor, private key used by XOM chip Symmetric Encryption - the private key is unique to each program, also called the XOM ID Secured Computing  Enforces access restrictions using tagged and encrypted storage  Encrypted code execution using on-chip decryption

XOM Internal Security L2 Cache lines tagged with XOM ID with valid bits for each word in cache line L1 Cache lines are tagged with a XOM ID Registers are tagged with a XOM ID XOM ID is kept in a table in the XOM chip

XOM Context Switches Involves 4 special registers:  Data register - Data is packaged into movable (by the interrupting application), read-write protected data. A mutating key and XOM ID is used for packaging.  Hash registers (2) bit hash is made from the package, stored in two 64 bit registers  XOM ID register - storing XOM tag

XOM and External Memory Encrypts data with XOM ID and creates a hash (MAC) Message Authentication Code – a keyed one way hash, protects against spoofing and slicing attacks

XOM Performance Issues Optimizations:  Use a reversible CRC instead of MAC  Dedicated, pipelined DES encryption/decryption hardware. Max of 50% slowdown assuming a 48 cycle Triple DES implementation and 100 cycle memory access latency.

XOM with One-Time Pad Average XOM slowdown is 16.7% on SPEC 2000 benchmarks Around 30% slowdown on memory intensive programs One-Time Pad encryption can be used to remove encryption/decryption from critical path

XOM with OTP Proposed OTP solution  Cipher = plain  encrypted key (address + seq)  Plain = cipher  encrypted key (address + seq) key = XOM ID address = virtual address of data/instruction seq = mutating sequence number encrypted key (address + seq) is concurrent with memory access Encryption/decryption requires a one cycle XOR operation

XOM with OTP  Cipher = plain  encrypted key (address + seq)  Plain = cipher  encrypted key (address + seq) key = XOM ID address = virtual address of data/instruction seq = mutating sequence number

XOM with OTP Sequence Number Cache (SNC)  Stores sequence numbers for each cache line  Accessed by virtual address of cache line  Limited size Use replacement – store parts of SNC in unsecured memory No replacement – OTP on some data, can’t use OTP on rest of data

XOM with OTP Sequence Number Cache operation  Hits – sequence number is accessed and passed on to the encryption unit  Misses No replacement – default back to original XOM, where encryption is performed after memory access. Costs cycles With replacement – fetch sequence number memory, then perform encryption

XOM with OTP SNC and Context Switching  Dump to memory with encryption  Tag SNC entries with XOM ID

XOM with OTP Performance  16.7% XOM average slowdown  4.59% XOM w/ OTP – No Replacement  1.28% XOM w/ OTP – With Replacement 1.035% max additional memory traffic

Hash Trees Memory Integrity Verification  Allows the secure processor to ensure that the data it reads from memory matches the data most recently written Protection  Spoofing  Splicing  Replay

Hash Tree - Details Works by calculating a hash of data Hash is easy to compute given data, but hard to find data which will result in an equivalent hash Data HHHHHHHH HHHH H HH Secure

Hash Tree - Details Calculated when accessing memory  No need to calculate hash for a cache hit Data can be given speculatively to the processor while hash is generated and checked Speculative commits  Allowed using fetched but unverified data  Exception raised by hash checker does not need to be recovered from Stalls on hash checker when using processor’s secret key Simulations done show that with caching of hashes an average overhead of less than 20% can be achieved

Aegis Architecture Uses concepts from XOM and hash trees to create a “private and authenticated tamper-resistant environment” for the processor to run in This means that data is private from any observers and that any tampering will be detected

Aegis Architecture Allows a user to trust the results from a program  System Authentication  Program Authentication  Message Authentication This is accomplished by the sign_msg instruction, which encrypts a message and a hash of the program with the processor’s secret key before sending back to the user

Aegis Architecture To provide environment, 3 key things must be done  Memory Integrity Verification  Encryption/Decryption of off-chip memory  Context Switches managed securely

Aegis – Memory Integrity Verification Accomplished using hash trees Introduces new twist on hash trees, log hash In log hash, only memory accesses leading up to a sign_msg instruction are verified Greatly reduces cost of verification while not sacrificing much security

Aegis – Off chip memory Data stored in the off chip memory is encrypted and decrypted using the one time pad xom scheme to hide latency Pads are generated using the address of the data combined with a time stamp, incremented at every write-back Time stamps are needed before calculation of pad can begin, so caching of timestamps is a good idea

Aegis – secure context switches Uses a Secure Context Manager Maintains a table of all processes Table entry contains: secure process ID (SPID), program hash, register values, and hash for off- chip memory verification Table stored in memory, but can be cached for recent processes In addition, cache entries are tagged with SPID to ensure a process cannot gain access to another process’s data

Aegis - Overhead Overhead of SCM in negligible, main slow down comes from integrity verification and encryption of memory Using l-hashes and OTP encryption, authors were able to see an average overhead of < 25%, with a worst case of 55% of tested benchmarks

HIDE - Motivation Addresses the problem of secure information leaking due to monitoring of the address bus Access patterns reveal information about branching  Can be compared with known branching patterns to identify IP reused in secure process

HIDE – Critical Idea Addresses from the processor are remapped before being sent to memory Mapping is done using a permutation function to ensure a random mapping Current mapping (permutation vector) must be stored on chip

HIDE - Implementation To ensure that attackers cannot see patterns in memory accesses, each access from a current pv must happen once Implemented with locking cache blocks

HIDE – Hide Cache Modified L2 cache Cache hits (R and W) unmodified When a block is loaded on a cache miss, it is locked A locked block cannot be replaced When all blocks are locked, permutation must be done, which unlocks all blocks

HIDE – Permutation Steps A new pv is created mapping set of all current memory addresses to new addresses Blocks are loaded sequentially from memory and stored in their new location (pv[i]) in an on- chip buffer Buffer is written back sequentially to memory If on-chip buffer size S is less then memory size, M, process must be repeated M/S times

HIDE - Improvements Since permutation is a lengthy operation, don’t want to wait until all cache blocks are locked Idea of pre-permutation – start permutation when half of cache blocks are locked

HIDE - Improvements Instead of permuting entire memory at once, permute chunks at a time Chunk size is one or more pages Memory accesses within a chunk preserve security, only accesses across chunks leak information. Reduce by:  Larger chunk size  Store code to minimize inter-chunk access Requires maintaining info about each page

HIDE - Results Simulated using super scalar on SPEC2K benchmarks Average slowdown was only 1.3% Memory bandwidth used was on average 9% of total

HIDE - Conclusions Provides high level of security without imposing must loss in performance Requires slight modification to L2 cache, addition of permutation hardware Will not work for multiprocessor systems, since the pv and locking info must be communicated on unsecured bus

In Summary Supporting software security with hardware is a developing field Assumes basic model of secure processor with private half of public-private key pair XOM with OTP keeps memory private, hashes ensure memory is tamper free, and permutation scheme can be used to secure address bus When combined, allows users to trust results from a secure processor and software developers to create copy-proof software

Pending Questions Will users accept performance losses in order to gain security Will vendors support secure processing Problems relating to secret (private) key stored on processor