TEE-Perf A Profiler for Trusted Execution Environments Maurice Bailleu, Pramod Bhatotia Donald Dragoti, Christof Fetzer Thanks for the introduction! In this talk, I will present Speicher: a secure storage system for untrusted hosts. Speicher exports a persistent KV interface based on the LSM data structure. To provide strong security properties, Speicher leverages shielded execution based on Intel SGX. This is a joint work with Joerg Thalheim and Pramod Bha to tia from the University of Edinburgh. And we collaborated with Christof Fetzer from TU Dresden, MH from NEC Labs, and KV from MSR Cambridge. Transition: Let me start with the motivation of our project! Code available: https://github.com/mbailleu/tee-perf
Trusted Execution Environments Address space Secure memory region (or enclave) Trusted Execution Environment Security in untrusted infrastructure: How to establish trust in the untrusted computing infrastructure? Trusted Execution Environment (TEE): Hardware extension to provide secure memory region Protects application code and data against a powerful adversary (e.g. malicious OS/VMs) Trusted application Security in untrusted environment
Trusted Execution Environments Different implementations: Different ISAs Different OSs Architectures: Intel SGX, ARM TrrustZone, Keystone Wide range of TEE available, with different A wide range of TEEs available that are supported by different platforms
Performance problems inside TEEs TEE implementation details: Memory encryption overhead Switches between un-/trusted environments Syscalls (I/O operations) are prohibited Different characteristics for different TEEs Take away point at the end not the beginning Code running inside an TEE has surprisingly different performance characteristics
Research gap: Profiling for TEEs TEE environment: No HW counter No I/O OS cannot inspect processor state Architecture or platform dependent Describe laundry Makes it hard to adapt existing profiling tools
Our contribution Properties: TEE-perf: An architecture and platform-independent tool to measure performance on function level for application running inside a TEE Properties: Generality Architecture- and platform-independent Transparency Unmodified multi-threaded application Easy-to-use interface Accuracy Accurate method-level profiling No instruction sampling Full stop
Outline Design Motivation Challenges Evaluation - In the last part, I presented … - Next, I will talk about ...
Challenge #1: HW counter unavailability Hardware counters Not available inside TEEs Architecture dependent Acquire from the untrusted host Requires switch between un-/trusted environments Mapping a counter into the secure memory
Challenge #2: Application inspection Sampling by interrupting periodically Interrupts are expensive TEEs prevent observing the CPU Use function instrumentation to measure the code while executing
Challenge #3: Getting measurements data Communication over channels Channels require to leave TEEs TEE exit operations are expensive since they require TLB flushing, security checks, etc. Trusted enclave I/O call Exit enclave to issue the syscall Introducing a shared-memory log in the host memory
Challenge #4: Log format Measurement information are not human readable Tools do not understand the format Designed an offline analyzer that allows queries on the measurements and export data to other tools
Outline Motivation Challenges Design Evaluation
System overview #1 Compiler #2 Recorder #3 Analyzer #4 Visualizer
Compiler takes unmodified code and produces a binary for measurements Stage 1: Compiler Compiler takes unmodified code and produces a binary for measurements Inject code Function instrumentation Call/Ret Map code Communication Recorder
Stage 2: Recorder Host memory Enclave Recorder uses the instrumented binary to measure the execution and writes the profiled info to the shared-memory log Host memory Enclave Fn(A) Fn(B) Recorder Call B Write log Software Counter Call B Write log Ret Ret Log Header Record 1 Record 2 …
Stage 2: Log format Log header Log entry #1 Log entry #2 Append-only log allows lock-free appends, and small entries reduce log size Log header Log entry #1 Log entry #2 Call/Ret Counter value Call/Ret Counter value Instruction address Instruction address Thread ID Thread ID
Analyzer takes the log and presents retrieved information to the user Stage 3: Analyzer Analyzer takes the log and presents retrieved information to the user Call stack for each thread Calculates time spent per method Human readable Declarative query interface
Takes an Analyzer run and produces a Flamegraph Stage 4: Visualizer Takes an Analyzer run and produces a Flamegraph Add a flamegraph An example flame graph produced by TEE-perf
Outline Motivation Challenges Design Evaluation
Evaluation Questions: Experimental setup: What are the profiling overheads of TEE-Perf? Does TEE-Perf detect performance optimization opportunities? Experimental setup: Intel Xeon E3-1270 v5 (3.60 GHz, 4 cores, 8 hyper-threads) -- Skylake w/ SGX 64GiB RAM See the paper for more results Explain the questions – Say more evaluation in the paper Experimental setup – Say for completeness (Skylake CPU with SGX support and Intel SSD with SPDK support)
Q1 : Overhead of TEE-perf Say the evaluation question Say the X-axis: On x-axis Say the Y-axis: On x-axis Say how to interpret the plot: Higher the better Say the variants: Speicher and Native Explain the results. You tell the average and then explain the corner cases, e.g. min and max. Summarize: The evaluation question with answer – Takeaway The Takeaway: TEE-Perf has an mean overhead of 1.9x compared to perf
Q2: Detecting optimization opportunities Case study porting SPDK to Intel SGX: 14.4x slowdown of naively ported version TEE-perf showed that: 72% of the time was spent in getpid syscall 20% of the time was spent in getting a timestamp After optimization SPDK performance is on par with native Explain the questions – Say more evaluation in the paper Experimental setup – Say for completeness (Skylake CPU with SGX support and Intel SSD with SPDK support) TEE-perf is able to detect performance critical sections
Summary TEE-perf: An architecture and platform independent profiling tool for trusted execution environments (TEEs) Our tool is General: architecture and platform independent Transparent: supports unmodified multi-threaded applications Accurate: provides method-level profile w/o instruction sampling Code available: https://github.com/mbailleu/tee-perf