Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.

Efficient Memory Integrity Verification and Encryption for Secure Processors
G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas Massachusetts Institute of Technology

New Security Challenges
Current computer systems have a large Trusted Computing Base (TCB) Trusted hardware: processor, memory, etc. Trusted operating systems, device drivers Future computers should have a much smaller TCB Untrusted OS Physical attacks  Without additional protection, components cannot be trusted Why smaller TCB? Easier to verify and trust Enables new applications ahoyhoy G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Applications Emerging applications require TCBs that are secure even from an owner Distributed computation on Internet/Grid computing distributed.net, and more Interact with a random computer on the net  how can we trust the result? Software licensing The owner of a system is an attacker Mobile agents Software agents on Internet perform a task on behalf of you Perform sensitive transactions on a remote (untrusted) host G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Single-Chip AEGIS Secure Processors
Only trust a single chip: tamper-resistant Off-chip memory: verify the integrity and encrypt Untrusted OS: identify a core part or protect against OS attacks Cheap, Flexible, High Performance Identify or Protect against Check Integrity, Encrypt I/O Untrusted OS Trusted Environment Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Secure Execution Environments
Tamper-Evident (TE) environment Guarantees a valid execution and the identity of a program; no privacy Any software or physical tampering to alter the program behavior should be detected  Integrity verification Private Tamper-Resistant (PTR) environment TE environment + privacy Assume programs do not leak information via memory access patterns  Encryption + Integrity verification Relate to applications G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Other Trusted Computing Platforms
IBM 4758 cryptographic coprocessor Entire system (processor, memory, and trusted software) in a tamper-proof package Expensive, requires continuous power XOM (eXecution Only Memory): David Lie et al Stated goal: Protect integrity and privacy of code and data Memory integrity checking does not prevent replay attacks Always encrypt off-chip memory Palladium/NGSCB: Microsoft Stated goal: Protect from software attacks Memory integrity and privacy are assumed (only software attacks) 7 mins? G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Memory Encryption

Memory Encryption Encrypt on an L2 cache block granularity
write L2 Cache ENCRYPT DECRYPT read Processor Untrusted RAM Encrypt on an L2 cache block granularity Use symmetric key algorithms (AES, 16 Byte chunks) Should be randomized to prevent comparing two blocks Adds decryption latency to each memory access G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Direct Encryption (CBC mode): encrypt
L2 Block B[4] AESK EB[1] EB[2] EB[3] EB[4] RV B[3] AESK B[2] AESK B[1] AESK Random # RV Processor Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Direct Encryption (CBC mode): decrypt
Processor AESK-1 B[4] L2 Miss!! Memory Request Read AESK-1 B[3] AESK-1 B[2] AESK-1 B[1] EB[4] EB[3] EB[2] EB[1] RV Memory Off-chip access latency = latency for the last chunk of an L2 block + AES + XOR  Decryption directly impacts off-chip latency G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP): encrypt
B[4] (Addr,TS,1) (Addr,TS,2) (Addr,TS,3) (Addr,TS,4) Time Stamp (TS) AESK-1 One-Time-Pad (OTP) B[3] B[2] EB[1] EB[2] EB[3] EB[4] TS To Memory B[1] Counter Processor Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP): decrypt
Processor (Addr,TS,1) (Addr,TS,2) (Addr,TS,3) (Addr,TS,4) AESK-1 B[4] L2 Miss!! Memory Request Read B[3] B[2] B[1] TS EB[4] EB[3] EB[2] EB[1] Memory Off-chip access latency = MAX( latency for the time stamp + AES, latency for an L2 block ) + XOR  Overlap the decryption with memory accesses G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Effects of Encryption on Performance
Simulations based on the SimpleScalar tool set 9 SPEC CPU2000 benchmarks 256-KB, 1-MB, 4-MB L2 caches with 64-B blocks 32-bit time stamps and random vectors  No caching! Memory latency: 80/5, decryption latency: 40 Performance degradation by encryption Direct (CBC) One-Time-Pad Worst Case 25% 18% Average 13% 8% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Security and Optimizations
The security of the OTP is at least as good as the conventional CBC scheme OTP is essentially a counter-mode (CTR) encryption Further optimizations are possible For static data such as instructions, time stamps are not required  completely overlap the AES computations with memory accesses Cache time stamps on-chip, or speculate the value Will be used for instruction encryption of Philips media processors +8 (15) G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Integrity Verification

Difficulty of Integrity Verification
Untrusted RAM Processor Program write E(124), MAC(0x45, 124) Address 0x45 VERIFY ENCRYPT DECRYPT read IGNORE E(120), MAC(0x45, 120) Trusted State Cannot simply MAC on writes and check the MAC on reads  Replay attacks Hash trees for integrity verification G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Hash Trees Logarithmic overhead for every cache miss Low performance
Processor h1=h(V1.V2) h2=h(V3.V4) root = h(h1.h2) VERIFY Logarithmic overhead for every cache miss Low performance ( 10x slowdown)  Cached hash trees VERIFY L2 block READ Data Values V1 V2 V3 V4 MISS Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Cached Hash Trees (HPCA’03)
Processor h1=h(V1.V2) h2=h(V3.V4) root = h(h1.h2) VERIFY Cache hashes in L2 L2 is trusted Stop checking earlier Less overhead ( 22% average, 51% worst case) Still expensive In L2 VERIFY VERIFY DONE!!! In L2 V1 V2 V3 V4 MISS MISS Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Can we do better? Some applications only require to verify memory accesses after a long execution Distributed computation No need to check after each memory access Can we just check a sequence of accesses? enter_aegis Execute Get results Verify results - H(Prog) - signature Program, Data RESULT RESULT Processor’s Private Key Processor’s Public Key Job Dispatcher Secure Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Integrity Verification: Idea
At run-time, maintain a log of reads and writes Reads: make a ‘read’ note with (address, value) in the log Writes: make a ‘write’ note with (address, value) in the log check: go thru log, check each read has the most recent value written to the address Problem!!: Log grows  use cryptographic hashes Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write ( 0x40, 1) Write (0x50, 2) Read (0x50, 2) Read (0x40, 1) Write 1 at 0x40 Write 2 at 0x50 Read 2 from 0x50 Read 1 from 0x40 Checker Log Untrusted Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Run-Time
Use set hashes as compressed logs Set hash: maps a set to a fixed length string ReadHash: a set of read entries (addr, val, time) in the log WriteHash: a set of write entries (addr, val, time) in the log Use Timer (time stamp) to keep the ordering of entries Only one additional time stamp access for each memory access WriteHash (0x40, 0, 0) (0x50, 0, 0) (0x40, 10, 1) ReadHash (0x40, 0, 0) Cache eviction Write 10 at 0x40 Initialize (all zero) Cache Miss!! Read 0 from 0x40 Read 2 from 0x50 Timer: 0 Timer: 1 Untrusted Memory Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Integrity Check
Read all the addresses that are not in a cache Compare ReadHash and WriteHash (same set?) WriteHash (0x40, 0, 0) (0x50, 0, 0) (0x40, 10, 1) ReadHash (0x40, 0, 0) (0x40, 10, 1) (0x50, 0, 0) =? Read all Read 2 from 0x50 Timer: 1 Timer: 0 Untrusted Memory Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Checking Overhead of Log Hash Scheme
Integrity check requires reading the entire memory space being used Cost depends on the size and the length of an application For long programs, the checking overhead is negligible Amortized over a long execution time Check overhead is negligible for programs w/ more than a billion accesses Better than Hash Trees for programs w/ more than 10 million accesses SWIM, 1MB L2, Uses 192MB G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Performance Comparisons
Overhead for TE environments Integrity verification Overhead for PTR environments Integrity verification + encryption CHTree LHash Worst Case 52% 15% Average 22% 4% +10 (25) CHTree + CBC LHash + OTP Worst Case 59% 23% Average 31% 10% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Summary Untrusted owners are becoming more prevalent
Untrusted OS, physical attacks  requires a small TCB Single-chip secure processors require off-chip protection mechanisms: Integrity verification and Encryption OTP encryption scheme reduces the overhead of encryption in all cases Allows decryption to be overlapped with memory accesses Cache or speculate time stamps to further hide decryption latency Log Hash scheme significantly reduces the overhead of integrity verification for certified execution when programs are long enough G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

More Information at www.csg.lcs.mit.edu
Questions? More Information at G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Conventional Tamper-Proof Packages
Processing system in a tamper-proof package (IBM 4758) Expensive: many detecting sensors Needs to be continuously powered: battery-backed RAM $2,690 in 2001 Memory 99MHz 486 Source: IBM website G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

PTR Performance

Incremental Multiset Hash (Asiacrypt`03)
Multiset: unordered group of elements where an element can appear more than once Intuitively, function h is an incremental multiset hash function if it satisfies: Compression: Arbitrary size multiset  fixed length string Incrementality: Easy to add/remove an element to a multiset Multiset-collision resistance: difficult to find multisets that produce the same output hash Use a multiset hash as a log Compression  can fit in a fixed on-chip storage Incrementality  easy to add a new read/write to the log Multiset-collision resistance  attackers cannot modify memory in a way that the log does not change G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Log Hash Algorithms: Integrity Check
Processor maintains small trusted states; ReadHash: records a multiset of read entries in the log WriteHash: records a multiset of write entries in the log Timer: used as time stamps  ordering of entries in the log At run-time, the log (ReadHash, WriteHash) is updated with minimal overhead (one additional access for time stamps) On a cache miss for addr: 1. Read (value, time stamp) from addr 2. ReadHash += hash(addr, value, time stamp) 3. Timer = max( Timer, time stamp + 1 ) On a write/eviction for (addr, value): 1. Write (value, Timer) to addr 2. WriteHash += hash(addr, value, Timer) Check: after reading all addresses, ReadHash = WriteHash? G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Cache Block (B) IV = B[1] B[2] B[3] B[4] Encryption AES AES AES AES EB
(0...0,Addr,RV) B[1] B[2] B[3] B[4] Encryption Key AES AES AES AES EB RV EB[1] EB[2] EB[3] EB[4]  In Memory Key AES-1 AES-1 AES-1 AES-1 IV Decryption B B[1] B[2] B[3] B[4] G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

One-Time-Pad Encryption (OTP)
( Fixed Vector, Address, Time Stamp, 1 ) ( Fixed Vector, Address, Time Stamp, 2) ( Fixed Vector, Address, Time Stamp, 3 ) ( Fixed Vector, Address, Time Stamp, 4 ) Pad Generation Key AES-1 AES-1 AES-1 AES-1 Encryption Pad (OTP) B B[1] B[2] B[3] B[4] Encryption EB TS EB[1] EB[2] EB[3] EB[4]  In Memory B Decryption B[1] B[2] B[3] B[4] G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

AEGIS: High-Level Architecture

Secure Context Manager (SCM)
A specialized module in the processor Assign a secure process ID (SPID) for each secure process Implements new instructions enter_aegis set_aegis_mode random sign_msg Maintains a secure table Even operating system cannot modify Standard Processor SCM Processor Core Regs SPID … L1 Instruction cache L1 Data cache On-Chip L2 Cache Off-Chip Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

SCM: Program Start-Up ‘enter_aegis’: TE mode ‘set_aegis_mode’
Start protecting the integrity of a program Compute and store the hash of the stub code: H(Prog)  Tampering of a program results in a different hash Stub code verifies the rest of the code and data ‘set_aegis_mode’ Start PTR mode on top of the TE mode enter_aegis code_end Program .text enter_aegis EKey1 = 0xA4523BC2E435D; EKey2 = 0xB034D2C654F32; E1Msg = … Secret=GetSecret(Challenge); Key1=Decrypt(EKey1, Secret); Key2=Decrypt(EKey2, Secret); CheckMAC(Key1, Key2, MAC); Msg = Decrypt(E1Msg, Key1); E2Msg = Encrypt(Msg, Key2); Output(E2Msg); H(Prog) Protected Table SHA-1 Stub Segment G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

SCM: On-Chip Protection
Registers on interrupts SCM saves Regs on interrupts, and restore on resume On-chip caches Need to protect against software attacks Use SPID tags and virtual memory address Allow accesses from the cache only if both SPID and the virtual address match Standard Processor SCM Processor Core Interrupt Regs SPID … H(Prog) Regs Resume L1 Instruction cache SPID Tags L1 Data cache On-Chip L2 Cache Off-Chip Memory G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Applications

Digital Rights Management
Protects digital contents from illegal copying Trusted software (player) on untrusted host Content provider only gives contents to the trusted player Requires the PTR environment Verify - H(Player) - nonce - signature Run Player - enter_aegis - enter PTR Player Authenticated & Encrypted Channel (SSL) Content Signed nonce Random nonce Processor’s Public Key Processor’s Private Key Content Provider Secure Processor G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Run-time Performance Log Hash scheme has very low run-time overhead
Often less than 5% (15% worst case), compared to 25% on average and 50% in the worst case for cached hash trees 1MB L2 Cache with 64B blocks G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Performance Implication: TE processing
Major performance degradation is from off-chip integrity checking Start-up and context switches are infrequent no performance overhead for on-chip tagging Worst case 50% degradation Most cases < 25% degradation L2 Caches with 64B blocks G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Effects of Encryption on Performance
Simulations based on the SimpleScalar tool set 9 SPEC CPU2000 benchmarks 32-bit time stamps (random vectors), 64-B L2 blocks Memory latency: 80/5, decryption latency: 40 Performance degradation by encryption Direct (CBC) One-Time-Pad Worst Case 25% 18% Average BW limited 19% 15% Other 9% 4% All 13% 8% G. Edward Suh — MIT Computer Science and Artificial Intelligence Laboratory MICRO36 — December 3-5, 2003

Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.

Similar presentations

Presentation on theme: "Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas.

Similar presentations

Presentation on theme: "Efficient Memory Integrity Verification and Encryption for Secure Processors G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas."— Presentation transcript:

Similar presentations

About project

Feedback