Download presentation
Presentation is loading. Please wait.
Published byAndre Graff Modified over 10 years ago
1
Architectures for Secure Processing Matt DeVuyst
2
Research Exam - Matt DeVuyst 2 Introduction L2 L1 - D L1 - I Pipeline, Functional Units L3 Main Memory Memory Bus CPU Line of Trust Points of Attack EDU Keys Encryption Decryption Unit and keys
3
Research Exam - Matt DeVuyst 3 Introduction What kind of security? Protection of what? For whom? From whom/what? This work focuses on: Protection of execution (process data and control flow) Protection for users, copyright holders, software companies Protection from all other processes (including OS) and physical attack This work focuses on general purpose security mechanisms for general purpose computers.
4
Research Exam - Matt DeVuyst 4 Introduction This research takes an architecture-centric approach. Cryptographic algorithms may be utilized but they will not be proven Focus is given to hardware support Software and OS reap the benefits
5
Research Exam - Matt DeVuyst 5 Goals Execution Privacy Process control flow and data exposed only to the CPU Execution Integrity Process control flow and data cannot be tampered with without detection
6
Research Exam - Matt DeVuyst 6 Outline Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions
7
Research Exam - Matt DeVuyst 7 Outline Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption Execution Integrity Proposed Architectures Conclusions and Open Questions
8
Research Exam - Matt DeVuyst 8 Naïve Encryption Encryption/ Decryption Unit CPU Memory Memory Bus Plaintext DataCyphertext Data Plaintext Data
9
Research Exam - Matt DeVuyst 9 A Closer Look At the Encryption/Decryption Unit AES in Cipher Block Chaining (CBC) Mode
10
Research Exam - Matt DeVuyst 10 Issues With Naïve Encryption On the critical path → Performance suffers Not secure against all attacks
11
Research Exam - Matt DeVuyst 11 Why Naïve Encryption Is Not Secure PlaintextCiphertext time Pattern is identical Encrypt Data Only
12
Research Exam - Matt DeVuyst 12 Why Naïve Encryption Is Not Secure PlaintextCiphertext time Pattern is still identical Encrypt Data/Address Writes to same address
13
Research Exam - Matt DeVuyst 13 Why Naïve Encryption Has Poor Performance Stores are effectively immune to encryption latency Store buffer Loads that miss in the cache cost: Time to bring in data from memory Time to decrypt that data time Memory LatencyDecryption Latency Load Instruction
14
Research Exam - Matt DeVuyst 14 Outline Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption* Improved OTP Encryption Execution Integrity Proposed Architectures Conclusions and Open Questions * Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT and Yang et al. “Fast Secure Processor for Inhibiting Software Piracy and Tampering” – UC Riverside
15
Research Exam - Matt DeVuyst 15 How OTP Encryption/Decryption Works EncryptionDecryption
16
Research Exam - Matt DeVuyst 16 Why OTP Encryption is Secure PlaintextCiphertext time No pattern is expressed Encrypt addr, seq # Writes to same address
17
Research Exam - Matt DeVuyst 17 How OTP Encryption Solves the Performance Problem Decryption done in parallel with load Taken off the critical path The key to how it works Decryption cannot depend on ciphertext time Memory LatencyDecryption Latency Load Instruction XOR
18
Research Exam - Matt DeVuyst 18 The Achilles’ Heel of OTP Encryption Sequence number must be available long before memory access completes time Memory Latency Decryption Latency Load Instruction Sequence number available here Sequence number associated with every cache-block- sized chunk of memory → Cannot keep all sequence numbers on chip XOR One solution: sequence number cache
19
Research Exam - Matt DeVuyst 19 Outline Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption* Execution Integrity Proposed Architectures Conclusions and Open Questions * Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech
20
Research Exam - Matt DeVuyst 20 Solutions To the OTP Problem Prediction and Precomputation Predict sequence number Precompute pad When memory access completes, compare real sequence number with predicted one If they match, use precomputed pad If they don’t match, compute real pad
21
Research Exam - Matt DeVuyst 21 Prediction and Precomputation TLB Root Seq # Page of memory Real Seq # Page table entry Cache block
22
Research Exam - Matt DeVuyst 22 Prediction and Precomputation TLB 129145 637432 179966 Page of memory 343923 Page table entry Cache block 343923 Initially, all sequence numbers are set to page’s root sequence number 343923
23
Research Exam - Matt DeVuyst 23 Prediction and Precomputation TLB 129145 343923 637432 179966 Page of memory 343925 343924 343935 Page table entry Cache block 343933 343925 Writes increment the sequence numbers
24
Research Exam - Matt DeVuyst 24 Prediction and Precomputation TLB 129145 637432 179966 Page of memory 343925 343924 343935 Page table entry Cache block 343933 343923 Start predictions with this 343925 Memory Latency Generate pad for seq # 343923 Load Instruction Generate pad for seq # 343924 Generate pad for seq # 343925
25
Research Exam - Matt DeVuyst 25 Better Prediction and Precompuatation Problem: Frequently updated data will have sequence number beyond prediction depth One solution: Reset root sequence number Use a prediction history for each page This is called “adaptive prediction” TLB Root Seq # Page table entry Prediction History
26
Research Exam - Matt DeVuyst 26 Better Prediction and Precompuatation Problem: Frequently updated data will have sequence number beyond prediction depth Another solution: Record past difference (diff) between root sequence number and real sequence number On subsequent load, make predictions around root sequence number + diff This is called “context-based” prediction TLB Root Seq # Page table entrydiff Register
27
Research Exam - Matt DeVuyst 27 Prediction and Precomputation Accuracy “Adaptive prediction” is reported to be about 80% accurate* “Context-based prediction” is reported to be close to 100% accurate* (though this has not yet been verified by other researchers). Cost Larger TLB Slightly larger memory footprint and bandwidth requirement Conclusion Using OTP with optimizations, decryption latency is almost completely hidden. * Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech
28
Research Exam - Matt DeVuyst 28 Outline Execution Privacy Execution Integrity Basic Execution Integrity Cached Hash Trees Log Hashing Proposed Architectures Conclusions and Open Questions
29
Research Exam - Matt DeVuyst 29 Execution Integrity – Basic Idea On a write… Keyed hash is taken over data and address Data and hash are stored in memory On a read… Data and hash are returned from memory Hash is computed Compare computed hash and returned hash CPUMemory DataHash(Key,Data,Address) DataHash(Key,Data,Address)
30
Research Exam - Matt DeVuyst 30 Security Analysis of Basic Execution Integrity Arbitrary data cannot be introduced because: The hash is keyed and An attacker does not know the key Data stored at one address cannot be substituted for data stored at another address because: Hashing the data along with the address binds the two But a replay attack is possible because: An attacker may replay stale data previously stored at the given address
31
Research Exam - Matt DeVuyst 31 Outline Execution Privacy Execution Integrity Basic Execution Integrity Cached Hash Trees* Log Hashing Proposed Architectures Conclusions and Open Questions * Blum, et al. “Checking the Correctness of Memories” – UC Berkley Gassend, et al. “Caches and Hash Trees for Efficient Memory Integrity Verification” – MIT Merkle, et al. “Protocols for Public Key Cryptography”
32
Research Exam - Matt DeVuyst 32 Cached Hash Trees Fundamental problem with basic hashing Hashes verified data integrity, but nothing verified the integrity of the hashes A solution: cached hash trees Keyed hashes are taken over data Keyed hashes are taken over those hashes, etc. Problem: memory requirement of hashes Solution: Hashes are stored in memory and cached on- chip along with data.
33
Research Exam - Matt DeVuyst 33 Cached Hash Trees How it works A tree is built Leaf nodes contain data Intermediate nodes are hashes The root hash is kept in a special register on-chip Hashes are only updated when necessary Data Block Hash
34
Research Exam - Matt DeVuyst 34 Cached Hash Tree Consistency Invariant: If a node is in memory → then it’s parent hash is consistent with it (whether the hash is in the cache or in memory)
35
Research Exam - Matt DeVuyst 35 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash hashes are not updatedIf data is written …
36
Research Exam - Matt DeVuyst 36 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash parent hash in cache is updatedIf dirty data is evicted …
37
Research Exam - Matt DeVuyst 37 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash parent hash in cache is updatedIf a hash block is evicted …
38
Research Exam - Matt DeVuyst 38 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash 1. The parent is loaded and verified against grandparent. If data is loaded and parent hash is not in the cache … 2. Then the data is verified against its parent.
39
Research Exam - Matt DeVuyst 39 Performance Analysis of Cached Hash Trees Common case: Hash nodes are in cache Data evictions only require an update to a cached node Data loads only require one hash check with cached node Uncommon case: Hash nodes are not in the cache Data evictions require hash node loads Data loads require hash node loads Passing hash nodes across the memory bus cuts into the bandwidth of data Hash nodes occupy space in the cache
40
Research Exam - Matt DeVuyst 40 Outline Execution Privacy Execution Integrity Basic Execution Integrity Cached Hash Trees Log Hashing* Proposed Architectures Conclusions and Open Questions * Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT
41
Research Exam - Matt DeVuyst 41 Log Hashing Key insight Verification is not necessary at every load Verification is necessary before application results are produced Implication Relax constraint on constant, vigilant verification
42
Research Exam - Matt DeVuyst 42 Log Hashing – Incremental Multiset Hashes* Incremental Keyed hash is not computed over all data, just additional data Multiset Duplicate items are allowed Multiplicity of items is significant Order of items is not Hash Set 1 Set 2 = Hash Engine * Clarke, et al. “Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking” – MIT
43
Research Exam - Matt DeVuyst 43 Log Hashing 2 incremental multiset hashes WriteHash Hashes everything evicted from cache (written to memory) ReadHash Hashes everything fetched from memory Counters are associated with memory operations and keyed hashes taken over (data, counter, address)
44
Research Exam - Matt DeVuyst 44 Log Hashing 3 phases of operation Initialization All program data written out to memory (hashed into WriteHash) Run-time Hash of every eviction is added to WriteHash Hash of every fetch is added to ReadHash Verification All data not in cache is brought in (hashing into ReadHash) ReadHash compared to WriteHash. If equal, integrity maintained. Else, integrity violated.
45
Research Exam - Matt DeVuyst 45 Log Hashing - Initialization Write HashRead Hash Memory Cache
46
Research Exam - Matt DeVuyst 46 Log Hashing – Run-time Write HashRead Hash Memory Cache
47
Research Exam - Matt DeVuyst 47 Log Hashing – Run-time Write HashRead Hash Memory Cache
48
Research Exam - Matt DeVuyst 48 Log Hashing – Verification Write HashRead Hash Memory Cache =
49
Research Exam - Matt DeVuyst 49 Log Hashing – Performance Analysis Initialization and verification are very costly We assume initialization and verification are rare occurrences. Run-time hashing has no overhead Loading/storing sequence numbers in memory incurs a small performance overhead and a small memory overhead.
50
Research Exam - Matt DeVuyst 50 Log Hashing – Security Analysis If data is tampered with in memory: ReadHash will be different from WriteHash. If data was returned from memory more times than it was written (as in a replay attack): The multiplicity of hashed items will not match → hashes will not match. If data is returned from memory out of order: The hashes won’t match because different counter values would have been hashed in with the data.
51
Research Exam - Matt DeVuyst 51 Outline Execution Privacy Execution Integrity Proposed Architectures XOM SP AEGIS SENSS Conclusions and Open Questions
52
Research Exam - Matt DeVuyst 52 Proposed Architectures XOM* First of its kind Uses naïve privacy and integrity mechanisms Slow and vulnerable to attack Keys for encryption and hashing burned on chip * Lie, et al. “Architectural Support for Copy and Tamper Resistant Software” – Stanford
53
Research Exam - Matt DeVuyst 53 Proposed Architectures Secret-Protected* Based on XOM Uses naïve privacy and integrity mechanisms Decouples secret from device Key stored on chip only during user session User keys are separate from device secret (hardware key) and are transferable * Lee, et al. “Architecture for Protecting Critical Secrets in Microprocessors” – Princeton
54
Research Exam - Matt DeVuyst 54 Proposed Architectures AEGIS* Uses OTP encryption for privacy without performance optimizations like prediction and precomputation Uses cached hash trees for integrity Hides device keys using Physically Random Functions (PRFs) The circuit timing characteristics of a particular chip are unique and impossible to measure. PUFs exploit this to create device secrets * Suh, et al. “Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions” – MIT
55
Research Exam - Matt DeVuyst 55 Proposed Architectures SENSS* Uses simple OTP encryption scheme like AEGIS Uses cached hash tree scheme like AEGIS Adds support for multiprocessor systems Each device has its own key Combination Cipher Block Chaining and One Time Pad mode encryption is used for cache-to-cache transfers * Zhang, et al. “SENSS: Security Enhancement to Symmetric Shared Memory Multiprocessors” - UTD
56
Research Exam - Matt DeVuyst 56 Outline Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions
57
Research Exam - Matt DeVuyst 57 Conclusions – OTP Execution privacy is solved by OTP encryption (with optimizations) Secure against all system-level attacks and physical attacks (outside processor). Almost no performance cost
58
Research Exam - Matt DeVuyst 58 Conclusions – Cached Hash Trees Cached hash trees are secure against all known attacks But they have potentially poor performance No research has been done to stress test them Performance is bad when hash tree is not in cache → a large working set or pathological access pattern may result in poor performance
59
Research Exam - Matt DeVuyst 59 Conclusions – Log Hashing Log hashing is secure as long as verification is done before results are used How do you ensure that results are not consumed by users or other applications e.g. disk writes, network writes, shared memory, screen refresh, OS interrupts Log hashing has good performance if verification is infrequent But what if it’s not? How many applications require frequent verification?
60
Research Exam - Matt DeVuyst 60 Conclusions – Keys Execution privacy and integrity require keys Keys must be protected, even if OS is compromised or physical attack How should keys be protected? Are Physically Random Functions really resistant to physical attack? How should device public keys be used? Should the manufacturer publish them? How should revocation work? What happens if ownership of the device is transferred?
61
Architectures for Secure Processing Matt DeVuyst
62
Research Exam - Matt DeVuyst 62 Cached Hash Tree Consistency CacheMemory = Up-to-date hash= Outdated hash Data Parent Hash Grandparent Hash 1. The parent is loaded and verified against grandparent. If dirty data is evicted and parent hash is not in the cache … 2. Then the parent is updated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.