Secure In-Cache Execution

Secure In-Cache Execution
20th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2017) Secure In-Cache Execution Yue Chen, Mustakimur Khandaker, Zhi Wang Florida State University The presenter’s view has enough information (narratives) for you to understand the work. Feel free to use and modify it at your research seminars. -Yue

Cold Boot Attack Dump memory by freezing and transplanting
Steal sensitive information First, let me introduce the cold boot attack. It’s a powerful physical attack that can extract sensitive data from a computer’s physical memory. When the temperature is low enough, the RAM content can live much longer after power loss. An attacker can dump the memory content by transplanting it to another computer. He can use his own operating system to boot the computer and get the secret content. Even the victim’s disk is encrypted, the key is still in memory in plaintext. So the attacker can extract the key and decrypt the hard disk. For mobile devices, the attack also works even their memory units are integrated/soldered onto the motherboard.

Cold Boot Attack Sensitive memory content in plaintext
“Secret Message” Memory “Secret Message” One cause of the attack is that the sensitive memory content is stored as plaintext. If an attacker transplants the memory to another machine, he can extract the sensitive information directly.

Our Solution Sensitive memory content cannot be read with encryption
? ? ? Memory “XXXXXXXXXXXX” However, if we encrypt the memory content, even if the memory is transplanted to another machine, the attacker still cannot read it. This is the high-level idea of our solution.

EncExec: Design Goals Data secrecy Performance isolation
Plaintext view only in cache; key protected as well Performance isolation Performance impact isolated from other processes Application transparency User program unmodified to run under EncExec We propose EncExec, it has three design goals. First, the data secrecy should be protected. The secret data plaintext view should only exist in cache, and the memory should be encrypted all the time. Also, the key should be protected as well. Second, the protected process shouldn’t have a big performance impact on other running processes. Last but not least, the application can run unmodified under our system.

Threat Model Able to perform cold boot attacks
No malware installed (e.g., kernel rootkit) Typical use scenario: Laptops lost in public places, even protected by encrypted hard disks and screen locks Next let me introduce the threat model of EncExec. First, the attacker can perform cold boot attacks and other similar physical attacks, and there is no malware installed on the victim machine, like kernel rootkit. Here is a typical use scenario: Your laptop is lost in a public place or even in your car, and it is protected by an encrypted hard disk and a screen lock. However, the attacker can still get the sensitive information, or decrypt the hard disk with cold boot attack.

Design Overview Data in memory always encrypted; decrypted into the L3 cache only when accessed Use reserved cache as a window over protected data Use L3 (instead of L1 or L2) cache to minimize performance impact Next let’s dig into the design overview of our system. The basic idea is: if the data is in memory, it’s encrypted; if it is in cache, it is decrypted. So the attacker only can get an encrypted memory without the key, when performing cold boot attacks. Here we reserve a piece of cache to store protected data. And we use L3 cache instead of L1 or L2 to minimize the performance impact. Because the L3 cache is large, unified and inclusive for most designs. It includes all the data cached by the L1 and L2 caches. Also, it is physically indexed and physically tagged. To enable the split views of the data, our system uses the reserved cache as a window over the protected data. CPU first accesses cache, then memory, so it will directly access the decrypted data in the cache rather than getting them from the memory.

Design Overview Decrypted data will never be evicted to memory (no cache conflict) Extend kernel’s virtual memory management to strictly control access Only data in the window are mapped in the address space If more data than window size -> page replacement Data out of the window remain encrypted in memory and is not accessible to the CPU. Kernel’s virtual memory management (namely the VMM) is extended to *strictly* control the process’ data access, so there will be NO plaintext data evicted to the memory due to cache conflict. Specifically, only the data in the window are mapped. If more protected data are used than the window size, EncExec selects a page in the window for replacement, similar to demand paging in the OS. The data are encrypted with hardware accelerated block ciphers. Both the key and the initial vector are randomly generated, and the key and sub-keys are also stored in the reserved cache to protect them from cold boot attacks.

Design Overview Two modes:
Given a block of secure memory for storing critical data Need to (slightly) modify the program Use reserved cache to protect all the data of the process Use the reserved cache as a moving window EncExec has two modes: In the first mode, it provides a block of secure memory. Any data stored in this memory is guaranteed to be protected. You can decide when and how to use this memory. So the program needs to be slightly modified. For example, we can modify a cryptography library so that its stack and session data are allocated from the secure memory. In the second mode, the complete data of a process is protected. In this mode, EncExec is transparent to the protected process; which means no program change is necessary. Since we use demand-paging, like a moving window over the data, the protected data amount can be much larger than the reserved cache size for both modes. It’s similar to how virtual memory can be larger than physical memory.

Design: Key Techniques
Spatial cache reservation Reserves a small part of the L3 cache for its use Secure in-cache execution Data encrypted in memory, plaintext view only in cache EncExec has two key techniques to make it work. The first one is spatial cache reservation, which reserves a small part of the L3 cache for its use. The second one is secure in-cache execution, which ensures the sensitive data in memory would be encrypted, and their plaintext view would only exist in cache.

CPU Cache Intel Core i7 4790 cache architecture
Before going further, let’s first review the CPU cache knowledge. This is a typical Intel i7 CPU architecture. There are three levels of caches. Each core has dedicated L1 and L2 caches, but all the four cores share a single large L3 cache. The L1 cache is split into an instruction cache and a data cache, each has thirty-two kilo-bytes in size. The L2 and L3 caches are unified so they cache both code and data. L1 and L2 caches are relatively small, but L3 is large at 8MB. Here inclusive means L3 cache contains everything cached by L1 and L2 caches.

CPU Cache 2-way set-associative cache, 8 cache lines in 4 sets. Each cache line has 16 bytes. Cache has several design architectures. Here is a two-way set associative cache with a cache line of 16 bytes. This cache is organized into eight cache lines, and each two neighbor lines are grouped into a set. The 16-bit memory address is divided into three fields: the offset field specifies an offset into a cache line. The set field is an index into the cache sets, and one memory line can be cached by either line in the indexed set. Lastly, the tag field uniquely identifies the line of the memory stored in cache. During cache lookup or fill, CPU first locates the set, and then compare tags to find the cache line and the address.

Challenges: Spatial Cache Reservation
Fine-grained cache control x86 transparently manages cache assignment and replacement Countermeasures: Rule 1 Protected data are only cached by the reserved cache No other memory is cached by the reserved cache Rule 2: Accessible (decrypted) protected data is less than the reserved cache size Thus, reserved cache content will not be evicted The design of EncExec brings the challenge that x86 *transparently* manages cache assignment and replacement, it lacks fine-grained cache control. Here we give two rules to overcome the challenge. First, the protected data should only be cached by the reserved cache, and no other memory is cached by it. So neither the kernel itself nor other processes can cause the reserved cache to be evicted. Second, the amount of the accessible protected data must be less than the size of the reserved cache. So the protected data themselves cannot cause the reserved cache to be evicted. Consequently, it guarantees NO reserved cache content will be evicted.

Design: Spatial Cache Reservation
Use page table to control reserved memory usage Page table can only map page-sized and page-aligned memory Reserve at least a whole page on the L3 cache Reserve a smallest page of the cache (4KB) How much space in total we need to reserve? Since we leverage page table to control the use of reserved memory, and a page table can only map page-sized and page-aligned memory, we have to reserve at least a whole page on the L3 cache. Although the processor supports several page sizes like 4 kilo-bytes, 2 megabytes, or 1 gigabyte, we only reserve a smallest page to minimize the performance overhead. Next let’s ask the question: How much space *in total* we need to reserve?

*0 *0 *4 *4 *8 *8 *C *C Set 0 Set 1 Set 2 Set 3 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory Here we give A simple example to illustrate the idea. This is a typical two-way set-associative cache, which has four sets.

*0 *0 *4 *4 *8 *8 *C *C : Needs to be reserved Set 0 Set 1 Set 2 Set 3 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory If we would like to reserve these two lines in memory;

*0 *0 *4 *4 *8 *8 *C *C : Needs to be reserved Set 0 Set 1 Set 2 Set 3 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory we have to reserve these four lines in cache. Because these can be cached by either line in each set.

*0 *0 *4 *4 *8 *8 *C *C : Needs to be reserved Set 0 Set 1 Set 2 Set 3 00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C Memory Next, since all these lines in memory, which are marked in red rectangles, can be mapped into these four cache lines. In order to fully reserve the cache space, all the marked memory should be reserved as well.

Example: Spatial Cache Reservation
Intel Core-i L3 cache 16-way set-associative; physically indexed and physically tagged Cache line size: 64 bytes = 26 bytes (offset field: 6 bits) Cache size: 8 MB Set number: 8M/(64*16) = 8192 = 213 (set field: 13 bits) If machine has 16GB (234) of physical memory, tag field has 15 bits (34 – = 15). Let’s look at a concrete example for cache reservation. Due to time constraint, I will skip the these calculations.

Reserve one page (4KB) 64 cache lines in one page Page_size/cache_line_size = 4K/64 = 64 Need to reserve 64 cache sets All the cache lines in the same set reserved together (16-way) Reserve 64KB cache in total 64 (set number) * 16 (associativity ways) * 64B (cache line size) = 64KB If we reserve one page, which is 4KB. There are 64 cache lines in one page, calculated as 4K/64 = 64. Because *here* we need to reserve the whole set for each cache line, we reserve totally 64 cache sets. By multiplying set number, associativity ways and cache line size together, we need to reserve 64 KB of the cache in total.

Reserve 1/128 of the physical memory 64 (reserved sets) / 8192 (total) = 1/128 Reserve one physical page for every 128 pages If RAM is 16GB, the total reserved memory is: 16GB * 1/128 = 128MB Ensure no cache eviction: Can use 64KB (16 pages) at a time of the 128MB Name these 16 pages as plaintext pages Protected data can be larger than 64KB as we use demand paging We reserve one physical page for every 128 pages. In order to ensure there won’t be any cache eviction, we can use 64KB at a time of the 128MB. We call these 16 pages as plaintext pages. If the protected data is larger than 64KB, we can use demand paging to deal with this situation.

Design: Secure In-Cache Execution
Desynchronize memory (encrypted) and cache (plaintext) Cache in write-back mode Guaranteed by hardware and existing kernels (in most OS’es) L3 cache is inclusive of L1 and L2 caches Guaranteed by hardware and existing kernels No conflict in the reserved cache No more protected data at a time than the reserved cache size Now let’s get to the second technique: secure in-cache execution. What it does is to desynchronize the memory and the cache. Note that the protected content in memory is encrypted, and in cache it’s decrypted, namely in plaintext. To achieve this goal, there are three requirements. First, the cache must be configured to use the write-back mode, so data modified in the cache will not be written through to memory. Second, the L3 cache is inclusive of the L1 and L2 caches, so the CPU will always access the memory through the L3 cache. Third, there is no conflict in the reserved cache, so CPU will not evict any reserved cache lines. The first two requirements are guaranteed by the hardware and existing kernels; and the third one is fulfilled because there are no more protected data at a time than the reserved cache size.

More data to protect? Demand paging Access unmapped data -> page fault Allocate a plaintext page (for securing data) If no page available, select one for replacement Encrypt the plaintext page, copy it back Decrypt faulting page into plaintext, update page table if necessary So what if we have more data to protect? Here EncExec employs demand-paging. When the process tries to access the unmapped data, the hardware delivers a page fault exception. EncExec hooks into the page fault handler. If the fault is caused by the unmapped protected data, it tries to allocate a plaintext page for the faulting page. If none is available, it selects one for replacement. Specifically, it first encrypts the plaintext page and copies it back into the original page, then decrypts the faulting page into the plaintext one and update the page table if necessary. EncExec can adopt the kernel’s sophisticated page replacement algorithm, like LRU, to select pages for replacement.

Dedicate one plaintext page to store keys and sub-keys Cannot be evicted or replaced Frequent protected data encryption/decryption Use CPU built-in support to speed up cryptographic algorithms To ensure the security, we dedicate one plaintext page to store the key, its derived sub-keys, and other intermediate data. They cannot be evicted or replaced. In our design, EncExec needs to frequently encrypt and decrypt data. We use CPU hardware built-in support to speed up those crypto algorithms.

Implementation: Spatial Cache Reservation
Reserve physical pages in the booting process Modify allocators to skip reserved pages Make sure no reserved pages exist in page table Hook run-time page allocator and kernel’s whole-cache flushing function For the implementation of spatial cache reservation, we reserve physical pages in the system booting process, by modifying allocators to skip reserved pages. To make sure no reserved pages exist in the page table, we write a small kernel function to scan it for them. And the result shows no such pages are found. Also, we hook into the run-time page allocator to avoid our reserved pages from being added to the allocator. Also, we hook into and kernel’s whole-cache flushing function. By doing so, we can encrypt the plaintext pages first to prevent leakage of the unencrypted data.

Implementation: Secure In-Cache Execution
pmap is used to unmap a page Consist of architecture-specific data and functions to manage the process’ page table Maintain a reverse mapping for physical pages Track page usage information for page replacement Remove shared protected pages from other processes For the implementation of secure in-cache execution, there are two alternative choices. Here we focus on the solution about the pmap structure. It's a good candidate to unmap pages. It contains architecture-specific data and functions to manage the process’ page table; and it maintains a reverse mapping for each physical page, which keeps track of the virtual address that a physical page is mapped; EncExec uses reverse mapping to completely disconnect a shared page from all the processes. Pmap also tracks page usage information for page replacement. EncExec can make use of this information to optimize its own page replacement algorithm. If a page is shared by multiple processes, EncExec removes it from other processes as well.

Performance Evaluation
Use the hardware-accelerated AES (AES-NI) to encrypt and decrypt data About 3μs on average to encrypt/decrypt 4KB data using 128-bit AES algorithm 6μs to replace an existing page EncExec uses the hardware-accelerated AES to encrypt and decrypt data. Our measurement shows it takes about 3 micro-seconds to encrypt or decrypt 4KB data, using 128-bit AES. Therefore, there is an about extra 3 micro-seconds delay to load a data page and 6 micro-seconds to replace an existing page. This delay is the most significant source of the overhead, but it is hard to be reduced.

Mode 1: Choose data to encrypt Mode 2: Encrypt all the data Test with 15 or 31 plaintext pages Overhead of common cryptographic algorithms We use the official benchmarks from mbed TLS to measure the performance overhead. Here Mode 1 means the programmer chooses what data to protect. We modified the source code of mbed TLS to allocate the session data and the stack from the secure memory. Mode 2 means protecting all the data sections with EncExec. We experimented with both 15 and 31 plaintext pages. In the latter, we double the reserved cache. For relatively simple algorithms like MD5 or AES, there is almost no overhead, as we see in the figure. Because neither the CPU nor the memory is a performance bottleneck.

Mode 1: Choose data to encrypt Mode 2: Encrypt all the data Test with 15 or 31 plaintext pages Overhead of RSA and DH handshakes For more complex algorithms like RSA or Diffie-hellman, the benchmark in mode 1 and mode 2 with 31 pages only slightly lags behind the baseline. However, for mode 2 with 15 pages, the performance overhead is relatively large. This can be explained that the working set of the benchmark is larger than 15 pages but less than 31 pages. With only 15 plaintext pages, thrashing happens, leading to poor performance. For mode 1, since we only need to protect selected data, so the overhead is trivial. For mode 2, it is more suitable to protect compact programs, such as a crypto service program.

Performance of Bonnie while concurrently running the mbed TLS benchmark. The unit on the Y-axis is MB/sec or thousand_seeks/sec (for RandomSeeks only). We also measured the impact on other concurrently running processes. We experimented with bonnie, a file system benchmark. In the figure, the blue bar shows the performance when the bonnie benchmark is running alone, and the green bar shows the performance when the mbed TLS benchmark is also running. We can see most of the results are almost the same, except the third one: SeqOutRewrite. Here the concurrent run is about 43% slower. This overhead likely is not caused by EncExec but the CPU scheduling, which schedules these two benchmarks into the *same* core. Without EncExec, the result is similar. The details can be found in the paper.

Secure In-Cache Execution
Q & A Secure In-Cache Execution

Backup Slides

Compared to Intel SGX SGX is great! EncExec Works on old CPUs
No time-consuming context switch Supports unmodified programs

Secure In-Cache Execution

Similar presentations

Presentation on theme: "Secure In-Cache Execution"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Secure In-Cache Execution

Similar presentations

Presentation on theme: "Secure In-Cache Execution"— Presentation transcript:

Similar presentations

About project

Feedback