DATA COMPROMISE Controlling the flow of sensitive electronic information remains a major challenge, ranging from theft to accidental violation of policies. This data does not represent the several breaches that have an unknown number of victims. It includes physical theft and hacking, but concerns only the compromise of electronic data. Practical Data Confinement Andrey Ermolinsky, Lisa Fowler, Sachin Katti, and Scott Shenker Chronology of Personal Data Breaches in the US KEY PROBLEM It is difficult to secure sensitive data: Modern software is rife with security holes that can be exploited for exfiltration, theft, or leakage Users must be trained and trusted to remember, understand, and obey policy regarding data dissemination and handling Recent ISACA survey of corporate employees: 35% have knowingly violated corporate information flow policies at least once 22% have transferred sensitive internal information using a USB storage device OUR GOALS AND CONSTRAINTS Develop a practical mechanism for information flow control in enterprise environments Protect sensitive data against external attacks End-to-end enforcement of high-level information flow policies “do not disseminate the attached file” “do not copy X to USB storage devices” Key constraints: compatibility with existing software (OS and applications) and patterns of use PDC Implementation: Overview OUR APPROACH Fine-grained information flow control (IFC) and policy enforcement in virtual hardware Interpose a thin virtualization layer (hypervisor) between the OS kernel and hardware Hypervisor emulates hardware-level IFC Associates a sensitivity tag with each byte of the virtual machine (registers, memory, disk) Tracks propagation of tags at the level of machine instructions Intercepts output (network transmission, writes to removable storage) and enforces policies Coarse-grained VM-level partitioning Tag tracking code generation (example) “When security gets in the way, sensible, well meaning, dedicated people develop hacks and workarounds that defeat the security.” - Don Norman Our focus Data from: -Privacy Clearing House, retrieved Jan Information Systems Audit and Control Association (ISACA), August 2007 PRELIMINARY PERFORAMNCE STUDY Worst-case 10x slowdown for compute-intensive tasks (e.g., text searching) Overhead depends on the amount of sensitive data and degree of tag fragmentation Red/Green VM Partitioning MAIN CHALLENGES Tag storage overhead ( exploit spatial locality ) Computational overhead of tag tracking (“on-demand” emulation, asynchronous tracking) Tag explosion and erosion Semantic gap between app-level data units and machine state PDC IMPLEMENTATION Prototype: Hypervisor (Xen); Paravirtualized Linux guest kernel; x86 Emulator/Tag Tracker (QEMU); Tag- aware filesystem (ext3) Information Flow Tracking: Hypervisor Dynamically switches the guest VM between native virtualized and emulated execution Plays tricks with guest page tables to intercept initial access to sensitive data Emulator/Tag Tracker Recompiles the machine code, generates corresponding set of tag tracking instructions Executes the tag tracking instruction stream asynchronously in a separate thread Picture