TEE-Perf A Profiler for Trusted Execution Environments

Slides:

Advertisements

Similar presentations

Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare

Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

G Robert Grimm New York University Disco.

Figure 1.1 Interaction between applications and the operating system.

1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.

KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.

1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.

Virtualization for Cloud Computing

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.

KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University

Operating Systems Lecture 7 OS Potpourri Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software.

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

11 World-Leading Research with Real-World Impact! ZeroVM Backgroud Prosunjit Biswas Institute for Cyber Security University of Texas at San Antonio April.

Chapter 10 System Monitoring Issues Performance Benchmarks NT Server Services Users and Server Access Information Task Manager for Applications Ram and.

Contents Introduction Available OSF Solutions for VM UFO Design Implementation Evaluation Discussion Conclusions References.

CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Protecting The Kernel Data through Virtualization Technology BY VENKATA SAI PUNDAMALLI id :

Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.

Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.

Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->

MIT/Determina Application Communities, page 1 Approved for Public Release, Distribution Unlimited - Case 9649 Collaborative learning for security and repair.

Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD

Embedded Real-Time Systems Processing interrupts Lecturer Department University.

Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore

Memory Protection: Kernel and User Address Spaces Andy Wang Operating Systems COP 4610 / CGS 5765.

CS 695 Topics in Virtualization and Cloud Computing, Autumn 2012 CS 695 Topics in Virtualization and Cloud Computing More Introduction + Processor Virtualization.

GridOS: Operating System Services for Grid Architectures

DDC 2223 SYSTEM SOFTWARE DDC2223 SYSTEM SOFTWARE.

Introduction to Operating Systems

MadeCR: Correlation-based Malware Detection for Cognitive Radio

Memory Protection: Kernel and User Address Spaces

CS 6560: Operating Systems Design

EnGarde: Mutually Trusted Inspection of SGX Enclaves

Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*

Outline What does the OS protect? Authentication for operating systems

Chapter 6: Network Layer

Outline What does the OS protect? Authentication for operating systems

Hardware Support for Embedded Operating System Security

Many-core Software Development Platforms

OS Virtualization.

Memory Protection: Kernel and User Address Spaces

Continuous, Low Overhead, Run-Time Validation of Program Executions

Introduction to Operating Systems

Memory Protection: Kernel and User Address Spaces

Memory Protection: Kernel and User Address Spaces

Guoxing Chen1* & Wenhao Wang2,3*, Tianyu Chen2, Sanchuan Chen1,

Virtualization Techniques

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Lecture Topics: 11/1 General Operating System Concepts Processes

Chapter 1 Introduction.

The Design & Implementation of Hyperupcalls

Operating Systems Lecture 1.

Hardware Counter Driven On-the-Fly Request Signatures

Prof. Leonardo Mostarda University of Camerino

Operating Systems: A Modern Perspective, Chapter 3

Operating System Introduction.

Sai Krishna Deepak Maram, CS 6410

Reverse engineering through full system simulations

SCONE: Secure Linux Containers Environments with Intel SGX

Shielding applications from an untrusted cloud with Haven

January 15, 2004 Adrienne Noble

Memory Protection: Kernel and User Address Spaces

Christof Fetzer Michio Honda Kapil Vaswani TU Dresden NEC Labs

Stefano Tempesta Secure Machine Learning with SQL Server Always Encrypted with Secure Enclaves.

Meltdown & Spectre Attacks

Presentation transcript:

TEE-Perf A Profiler for Trusted Execution Environments Maurice Bailleu, Pramod Bhatotia Donald Dragoti, Christof Fetzer Thanks for the introduction! In this talk, I will present Speicher: a secure storage system for untrusted hosts. Speicher exports a persistent KV interface based on the LSM data structure. To provide strong security properties, Speicher leverages shielded execution based on Intel SGX. This is a joint work with Joerg Thalheim and Pramod Bha to tia from the University of Edinburgh. And we collaborated with Christof Fetzer from TU Dresden, MH from NEC Labs, and KV from MSR Cambridge. Transition: Let me start with the motivation of our project! Code available: https://github.com/mbailleu/tee-perf

Trusted Execution Environments Address space Secure memory region (or enclave) Trusted Execution Environment Security in untrusted infrastructure: How to establish trust in the untrusted computing infrastructure? Trusted Execution Environment (TEE): Hardware extension to provide secure memory region Protects application code and data against a powerful adversary (e.g. malicious OS/VMs) Trusted application Security in untrusted environment

Trusted Execution Environments Different implementations: Different ISAs Different OSs Architectures: Intel SGX, ARM TrrustZone, Keystone Wide range of TEE available, with different A wide range of TEEs available that are supported by different platforms

Performance problems inside TEEs TEE implementation details: Memory encryption overhead Switches between un-/trusted environments Syscalls (I/O operations) are prohibited Different characteristics for different TEEs Take away point at the end not the beginning Code running inside an TEE has surprisingly different performance characteristics

Research gap: Profiling for TEEs TEE environment: No HW counter No I/O OS cannot inspect processor state Architecture or platform dependent Describe laundry Makes it hard to adapt existing profiling tools

Our contribution Properties: TEE-perf: An architecture and platform-independent tool to measure performance on function level for application running inside a TEE Properties: Generality Architecture- and platform-independent Transparency Unmodified multi-threaded application Easy-to-use interface Accuracy Accurate method-level profiling No instruction sampling Full stop

Outline Design Motivation Challenges Evaluation - In the last part, I presented … - Next, I will talk about ...

Challenge #1: HW counter unavailability Hardware counters Not available inside TEEs Architecture dependent Acquire from the untrusted host Requires switch between un-/trusted environments Mapping a counter into the secure memory

Challenge #2: Application inspection Sampling by interrupting periodically Interrupts are expensive TEEs prevent observing the CPU Use function instrumentation to measure the code while executing

Challenge #3: Getting measurements data Communication over channels Channels require to leave TEEs TEE exit operations are expensive since they require TLB flushing, security checks, etc. Trusted enclave I/O call Exit enclave to issue the syscall Introducing a shared-memory log in the host memory

Challenge #4: Log format Measurement information are not human readable Tools do not understand the format Designed an offline analyzer that allows queries on the measurements and export data to other tools

Outline Motivation Challenges Design Evaluation

System overview #1 Compiler #2 Recorder #3 Analyzer #4 Visualizer

Compiler takes unmodified code and produces a binary for measurements Stage 1: Compiler Compiler takes unmodified code and produces a binary for measurements Inject code Function instrumentation Call/Ret Map code Communication Recorder

Stage 2: Recorder Host memory Enclave Recorder uses the instrumented binary to measure the execution and writes the profiled info to the shared-memory log Host memory Enclave Fn(A) Fn(B) Recorder Call B Write log Software Counter Call B Write log Ret Ret Log Header Record 1 Record 2 …

Stage 2: Log format Log header Log entry #1 Log entry #2 Append-only log allows lock-free appends, and small entries reduce log size Log header Log entry #1 Log entry #2 Call/Ret Counter value Call/Ret Counter value Instruction address Instruction address Thread ID Thread ID

Analyzer takes the log and presents retrieved information to the user Stage 3: Analyzer Analyzer takes the log and presents retrieved information to the user Call stack for each thread Calculates time spent per method Human readable Declarative query interface

Takes an Analyzer run and produces a Flamegraph Stage 4: Visualizer Takes an Analyzer run and produces a Flamegraph Add a flamegraph An example flame graph produced by TEE-perf

Outline Motivation Challenges Design Evaluation

Evaluation Questions: Experimental setup: What are the profiling overheads of TEE-Perf? Does TEE-Perf detect performance optimization opportunities? Experimental setup: Intel Xeon E3-1270 v5 (3.60 GHz, 4 cores, 8 hyper-threads) -- Skylake w/ SGX 64GiB RAM See the paper for more results Explain the questions – Say more evaluation in the paper Experimental setup – Say for completeness (Skylake CPU with SGX support and Intel SSD with SPDK support)

Q1 : Overhead of TEE-perf Say the evaluation question Say the X-axis: On x-axis Say the Y-axis: On x-axis Say how to interpret the plot: Higher the better Say the variants: Speicher and Native Explain the results. You tell the average and then explain the corner cases, e.g. min and max. Summarize: The evaluation question with answer – Takeaway The Takeaway: TEE-Perf has an mean overhead of 1.9x compared to perf

Q2: Detecting optimization opportunities Case study porting SPDK to Intel SGX: 14.4x slowdown of naively ported version TEE-perf showed that: 72% of the time was spent in getpid syscall 20% of the time was spent in getting a timestamp After optimization SPDK performance is on par with native Explain the questions – Say more evaluation in the paper Experimental setup – Say for completeness (Skylake CPU with SGX support and Intel SSD with SPDK support) TEE-perf is able to detect performance critical sections

Summary TEE-perf: An architecture and platform independent profiling tool for trusted execution environments (TEEs) Our tool is General: architecture and platform independent Transparent: supports unmodified multi-threaded applications Accurate: provides method-level profile w/o instruction sampling Code available: https://github.com/mbailleu/tee-perf