Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.

Slides:

Advertisements

Similar presentations

Advertisements

1 Implementing an Untrusted Operating System on Trusted Hardware David Lie Chandramohan A. Thekkath Mark Horowitz University of Toronto, Microsoft Research,

Physical Unclonable Functions and Applications

Implementing an Untrusted Operating System on Trusted Hardware.

Architectures for Secure Processing Matt DeVuyst.

CS 153 Design of Operating Systems Spring 2015

Computer Organization and Architecture

Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.

Cryptography1 CPSC 3730 Cryptography Chapter 11, 12 Message Authentication and Hash Functions.

Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.

Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.

Architecture for Protecting Critical Secrets in Microprocessors Ruby Lee Peter Kwan Patrick McGregor Jeffrey Dwoskin Zhenghong Wang Princeton Architecture.

Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.

1 Architectural Support for Copy and Tamper Resistant Software David Lie, Chandu Thekkath, Mark Mitchell, Patrick Lincoln, Dan Boneh, John Mitchell and.

IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.

Cryptography, Authentication and Digital Signatures

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

G53SEC 1 Reference Monitors Enforcement of Access Control.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.

Deck 10 Accounting Information Systems Romney and Steinbart Linda Batch March 2012.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.

IT 221: Introduction to Information Security Principles Lecture 5: Message Authentications, Hash Functions and Hash/Mac Algorithms For Educational Purposes.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.

Architecture Support for Secure Computing Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04.

Message Authentication Code

CS161 – Design and Architecture of Computer

Hardware-rooted Trust for Secure Key Management & Transient Trust

Trusted Computing and the Trusted Platform Module

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Non Contiguous Memory Allocation

Chapter 2: Computer-System Structures

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

Cryptographic Hash Function

CSCE 715: Network Systems Security

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

CMSC 611: Advanced Computer Architecture

Secure Processing On-Chip

Bastion secure processor architecture

Computer-System Architecture

Module 2: Computer-System Structures

AEGIS: Secure Processor for Certified Execution

Processor Fundamentals

CS399 New Beginnings Jonathan Walpole.

User-mode Secret Protection (SP) architecture

Virtual Memory Hardware

MICRO-2018 Gururaj Saileshwar1 Prashant Nair1 Prakash Ramrakhyani2

Module 2: Computer-System Structures

Contents Memory types & memory hierarchy Virtual memory (VM)

Cryptographic Hash Functions Part I

Sai Krishna Deepak Maram, CS 6410

CSC3050 – Computer Architecture

Physical Unclonable Functions and Applications

CSE 471 Autumn 1998 Virtual memory

Chapter 2: Computer-System Structures

Chapter 2: Computer-System Structures

Module 2: Computer-System Structures

Module 2: Computer-System Structures

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

COMP755 Advanced Operating Systems

Virtual Memory.

CSE 542: Operating Systems

ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.

Presentation transcript:

Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas Massachusetts Institute of Technology

New Security Challenges Current computer systems have a large Trusted Computing Base (TCB) Trusted hardware: processor, memory, etc. Trusted operating systems, device drivers Future computers should have a much smaller TCB Untrusted OS Physical attacks  Without additional protection, components cannot be trusted Why smaller TCB? Easier to verify and trust Enables new applications ahoyhoy G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Applications Emerging applications require TCBs that are secure even from an owner Distributed computation on Internet/Grid computing SETI@home, distributed.net, and more Interact with a random computer on the net  how can we trust the result? Software licensing The owner of a system is an attacker Mobile agents Software agents on Internet perform a task on behalf of you Perform sensitive transactions on a remote (untrusted) host G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Single-Chip AEGIS Secure Processors Only trust a single chip: tamper-resistant Off-chip memory: verify the integrity and encrypt Untrusted OS: identify a core part or protect against OS attacks Cheap, Flexible, High Performance Identify or Protect against Check Integrity, Encrypt I/O Untrusted OS Trusted Environment Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Secure Execution Environments Tamper-Evident (TE) environment Guarantees a valid execution and the identity of a program; no privacy Any software or physical tampering to alter the program behavior should be detected  Integrity verification Private Tamper-Resistant (PTR) environment TE environment + privacy Assume programs do not leak information via memory access patterns  Encryption + Integrity verification Relate to applications G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Other Trusted Computing Platforms IBM 4758 cryptographic coprocessor Entire system (processor, memory, and trusted software) in a tamper-proof package Expensive, requires continuous power XOM (eXecution Only Memory): David Lie et al Stated goal: Protect integrity and privacy of code and data Memory integrity checking does not prevent replay attacks Always encrypt off-chip memory Palladium/NGSCB: Microsoft Stated goal: Protect from software attacks Memory integrity and privacy are assumed (only software attacks) 7 mins? G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Memory Encryption

Memory Encryption Encrypt on an L2 cache block granularity write L2 Cache ENCRYPT DECRYPT read Processor Untrusted RAM Encrypt on an L2 cache block granularity Use symmetric key algorithms (AES, 16 Byte chunks) Should be randomized to prevent comparing two blocks Adds decryption latency to each memory access G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Direct Encryption (CBC mode): encrypt L2 Block B[4] AESK EB[1] EB[2] EB[3] EB[4] RV B[3] AESK B[2] AESK B[1] AESK Random # RV Processor Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Direct Encryption (CBC mode): decrypt Processor AESK-1 B[4] L2 Miss!! Memory Request Read AESK-1 B[3] AESK-1 B[2] AESK-1 B[1] EB[4] EB[3] EB[2] EB[1] RV Memory Off-chip access latency = latency for the last chunk of an L2 block + AES + XOR  Decryption directly impacts off-chip latency G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP): encrypt B[4] (Addr,TS,1) (Addr,TS,2) (Addr,TS,3) (Addr,TS,4) Time Stamp (TS) AESK-1 One-Time-Pad (OTP) B[3] B[2] EB[1] EB[2] EB[3] EB[4] TS To Memory B[1] Counter Processor Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP): decrypt Processor (Addr,TS,1) (Addr,TS,2) (Addr,TS,3) (Addr,TS,4) AESK-1 B[4] L2 Miss!! Memory Request Read B[3] B[2] B[1] TS EB[4] EB[3] EB[2] EB[1] Memory Off-chip access latency = MAX( latency for the time stamp + AES, latency for an L2 block ) + XOR  Overlap the decryption with memory accesses G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Effects of Encryption on Performance Simulations based on the SimpleScalar tool set 9 SPEC CPU2000 benchmarks 256-KB, 1-MB, 4-MB L2 caches with 64-B blocks 32-bit time stamps and random vectors  No caching! Memory latency: 80/5, decryption latency: 40 Performance degradation by encryption Direct (CBC) One-Time-Pad Worst Case 25% 18% Average 13% 8% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Security and Optimizations The security of the OTP is at least as good as the conventional CBC scheme OTP is essentially a counter-mode (CTR) encryption Further optimizations are possible For static data such as instructions, time stamps are not required  completely overlap the AES computations with memory accesses Cache time stamps on-chip, or speculate the value Will be used for instruction encryption of Philips media processors +8 (15) G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Integrity Verification

Difficulty of Integrity Verification Untrusted RAM Processor Program write E(124), MAC(0x45, 124) Address 0x45 VERIFY ENCRYPT DECRYPT read IGNORE E(120), MAC(0x45, 120) Trusted State Cannot simply MAC on writes and check the MAC on reads  Replay attacks Hash trees for integrity verification G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Hash Trees Logarithmic overhead for every cache miss Low performance Processor h1=h(V1.V2) h2=h(V3.V4) root = h(h1.h2) VERIFY Logarithmic overhead for every cache miss Low performance ( 10x slowdown)  Cached hash trees VERIFY L2 block READ Data Values V1 V2 V3 V4 MISS Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Cached Hash Trees (HPCA’03) Processor h1=h(V1.V2) h2=h(V3.V4) root = h(h1.h2) VERIFY Cache hashes in L2 L2 is trusted Stop checking earlier Less overhead ( 22% average, 51% worst case) Still expensive In L2 VERIFY VERIFY DONE!!! In L2 V1 V2 V3 V4 MISS MISS Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Can we do better? Some applications only require to verify memory accesses after a long execution Distributed computation No need to check after each memory access Can we just check a sequence of accesses? enter_aegis Execute Get results Verify results - H(Prog) - signature Program, Data RESULT RESULT Processor’s Private Key Processor’s Public Key Job Dispatcher Secure Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Integrity Verification: Idea At run-time, maintain a log of reads and writes Reads: make a ‘read’ note with (address, value) in the log Writes: make a ‘write’ note with (address, value) in the log check: go thru log, check each read has the most recent value written to the address Problem!!: Log grows  use cryptographic hashes Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write 1 at 0x40 Write 2 at 0x50 Read 2 from 0x50 Read 1 from 0x40 Checker Log Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Run-Time Use set hashes as compressed logs Set hash: maps a set to a fixed length string ReadHash: a set of read entries (addr, val, time) in the log WriteHash: a set of write entries (addr, val, time) in the log Use Timer (time stamp) to keep the ordering of entries Only one additional time stamp access for each memory access WriteHash (0x40, 0, 0) (0x50, 0, 0) (0x40, 10, 1) ReadHash (0x40, 0, 0) Cache eviction Write 10 at 0x40 Initialize (all zero) Cache Miss!! Read 0 from 0x40 Read 2 from 0x50 Timer: 0 Timer: 1 Untrusted Memory Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Integrity Check Read all the addresses that are not in a cache Compare ReadHash and WriteHash (same set?) WriteHash (0x40, 0, 0) (0x50, 0, 0) (0x40, 10, 1) ReadHash (0x40, 0, 0) (0x40, 10, 1) (0x50, 0, 0) =? Read all Read 2 from 0x50 Timer: 1 Timer: 0 Untrusted Memory Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Checking Overhead of Log Hash Scheme Integrity check requires reading the entire memory space being used Cost depends on the size and the length of an application For long programs, the checking overhead is negligible Amortized over a long execution time Check overhead is negligible for programs w/ more than a billion accesses Better than Hash Trees for programs w/ more than 10 million accesses SWIM, 1MB L2, Uses 192MB G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Performance Comparisons Overhead for TE environments Integrity verification Overhead for PTR environments Integrity verification + encryption CHTree LHash Worst Case 52% 15% Average 22% 4% +10 (25) CHTree + CBC LHash + OTP Worst Case 59% 23% Average 31% 10% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Summary Untrusted owners are becoming more prevalent Untrusted OS, physical attacks  requires a small TCB Single-chip secure processors require off-chip protection mechanisms: Integrity verification and Encryption OTP encryption scheme reduces the overhead of encryption in all cases Allows decryption to be overlapped with memory accesses Cache or speculate time stamps to further hide decryption latency Log Hash scheme significantly reduces the overhead of integrity verification for certified execution when programs are long enough G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

More Information at www.csg.lcs.mit.edu Questions? More Information at www.csg.lcs.mit.edu G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Conventional Tamper-Proof Packages Processing system in a tamper-proof package (IBM 4758) Expensive: many detecting sensors Needs to be continuously powered: battery-backed RAM $2,690 in 2001 Memory 99MHz 486 Source: IBM website G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

PTR Performance

Incremental Multiset Hash (Asiacrypt`03) Multiset: unordered group of elements where an element can appear more than once Intuitively, function h is an incremental multiset hash function if it satisfies: Compression: Arbitrary size multiset  fixed length string Incrementality: Easy to add/remove an element to a multiset Multiset-collision resistance: difficult to find multisets that produce the same output hash Use a multiset hash as a log Compression  can fit in a fixed on-chip storage Incrementality  easy to add a new read/write to the log Multiset-collision resistance  attackers cannot modify memory in a way that the log does not change G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Integrity Check Processor maintains small trusted states; ReadHash: records a multiset of read entries in the log WriteHash: records a multiset of write entries in the log Timer: used as time stamps  ordering of entries in the log At run-time, the log (ReadHash, WriteHash) is updated with minimal overhead (one additional access for time stamps) On a cache miss for addr: 1. Read (value, time stamp) from addr 2. ReadHash += hash(addr, value, time stamp) 3. Timer = max( Timer, time stamp + 1 ) On a write/eviction for (addr, value): 1. Write (value, Timer) to addr 2. WriteHash += hash(addr, value, Timer) Check: after reading all addresses, ReadHash = WriteHash? G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Cache Block (B) IV = B[1] B[2] B[3] B[4] Encryption AES AES AES AES EB (0...0,Addr,RV) B[1] B[2] B[3] B[4] Encryption Key AES AES AES AES EB RV EB[1] EB[2] EB[3] EB[4]  In Memory Key AES-1 AES-1 AES-1 AES-1 IV Decryption B B[1] B[2] B[3] B[4] G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP) ( Fixed Vector, Address, Time Stamp, 1 ) ( Fixed Vector, Address, Time Stamp, 2) ( Fixed Vector, Address, Time Stamp, 3 ) ( Fixed Vector, Address, Time Stamp, 4 ) Pad Generation Key AES-1 AES-1 AES-1 AES-1 Encryption Pad (OTP) B B[1] B[2] B[3] B[4] Encryption EB TS EB[1] EB[2] EB[3] EB[4]  In Memory B Decryption B[1] B[2] B[3] B[4] G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

AEGIS: High-Level Architecture

Secure Context Manager (SCM) A specialized module in the processor Assign a secure process ID (SPID) for each secure process Implements new instructions enter_aegis set_aegis_mode random sign_msg Maintains a secure table Even operating system cannot modify Standard Processor SCM Processor Core Regs SPID … L1 Instruction cache L1 Data cache On-Chip L2 Cache Off-Chip Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

SCM: Program Start-Up ‘enter_aegis’: TE mode ‘set_aegis_mode’ Start protecting the integrity of a program Compute and store the hash of the stub code: H(Prog)  Tampering of a program results in a different hash Stub code verifies the rest of the code and data ‘set_aegis_mode’ Start PTR mode on top of the TE mode enter_aegis code_end Program .text enter_aegis EKey1 = 0xA4523BC2E435D; EKey2 = 0xB034D2C654F32; E1Msg = … Secret=GetSecret(Challenge); Key1=Decrypt(EKey1, Secret); Key2=Decrypt(EKey2, Secret); CheckMAC(Key1, Key2, MAC); Msg = Decrypt(E1Msg, Key1); E2Msg = Encrypt(Msg, Key2); Output(E2Msg); H(Prog) Protected Table SHA-1 Stub Segment G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

SCM: On-Chip Protection Registers on interrupts SCM saves Regs on interrupts, and restore on resume On-chip caches Need to protect against software attacks Use SPID tags and virtual memory address Allow accesses from the cache only if both SPID and the virtual address match Standard Processor SCM Processor Core Interrupt Regs SPID … H(Prog) Regs Resume L1 Instruction cache SPID Tags L1 Data cache On-Chip L2 Cache Off-Chip Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Applications

Digital Rights Management Protects digital contents from illegal copying Trusted software (player) on untrusted host Content provider only gives contents to the trusted player Requires the PTR environment Verify - H(Player) - nonce - signature Run Player - enter_aegis - enter PTR Player Authenticated & Encrypted Channel (SSL) Content Signed nonce Random nonce Processor’s Public Key Processor’s Private Key Content Provider Secure Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Run-time Performance Log Hash scheme has very low run-time overhead Often less than 5% (15% worst case), compared to 25% on average and 50% in the worst case for cached hash trees 1MB L2 Cache with 64B blocks G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Performance Implication: TE processing Major performance degradation is from off-chip integrity checking Start-up and context switches are infrequent no performance overhead for on-chip tagging Worst case 50% degradation Most cases < 25% degradation L2 Caches with 64B blocks G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Effects of Encryption on Performance Simulations based on the SimpleScalar tool set 9 SPEC CPU2000 benchmarks 32-bit time stamps (random vectors), 64-B L2 blocks Memory latency: 80/5, decryption latency: 40 Performance degradation by encryption Direct (CBC) One-Time-Pad Worst Case 25% 18% Average BW limited 19% 15% Other 9% 4% All 13% 8% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003