Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert.

Slides:



Advertisements
Similar presentations
Virtualisation From the Bottom Up From storage to application.
Advertisements

Differentiated I/O services in virtualized environments
OLIVE Open Library of Images for Virtualized Execution Mahadev Satyanarayanan, Gloria St Clair, Benjamin Gilbert, Jan Harkes, Dan Ryan, Erika Linke, Keith.
Efficient VM Introspection in KVM and Performance Comparison with Xen
Virtualization and Cloud Computing
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
Virtual Machines Measure Up John Staton Karsten Steinhaeuser University of Notre Dame December 15, 2005 Graduate Operating Systems, Fall 2005 Final Project.
Virtualization and the Cloud
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Virtualization for Cloud Computing
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Tanenbaum 8.3 See references
VMM Based Rootkit Detection on Android Class Presentation Pete Bohman, Adam Kunk, Erik Shaw.
SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls John R. Lange and Peter Dinda University of Pittsburgh (CS) Northwestern University (EECS)
Making the Virtualization Decision. Agenda The Virtualization Umbrella Server Virtualization Architectures The Players Getting Started.
Windows Azure Conference 2014 Running Docker on Windows Azure.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Improving Network I/O Virtualization for Cloud Computing.
The Best of Both Worlds with On-Demand Virtualization Thawan Kooburat and Michael M. Swift On-Demand Virtualization allows systems to benefit from virtualization.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Hakim Weatherspoon Robbert van Renesse SUPERCLOUD: GOING BEYOND FEDERATED CLOUDS 1.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Swap Space and Other Memory Management Issues Operating Systems: Internals and Design Principles.
Docker and Container Technology
THAWAN KOOBURAT MICHAEL SWIFT UNIVERSITY OF WISCONSIN - MADISON 1 The Best of Both Worlds with On-Demand Virtualization.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
VMM Based Rootkit Detection on Android
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Intro To Virtualization Mohammed Morsi
Privacy-Sensitive VM Retrospection Wolfgang Richter 1, Glenn Ammons 3, Jan Harkes 1, Adam Goode 4, Nilton Bila 2, Eyal de Lara 2, Vasanth Bala 3, Mahadev.
1/1/ The Case for Content Search of VM Clouds Mahadev Satyanarayan, Wolfgang Richter, Glenn Ammons †, Jan Harkes and Adam Goode Carnegie Mellon University.
Virtualization Neependra Khare
OPENSTACK Presented by Jordan Howell and Katie Woods.
Lecture 13: Virtual Machines
Introduction to Operating Systems Concepts
Virtualization for Cloud Computing
Virtualization.
Virtual Machine Monitors
Section 4 Block Storage with SES
Fundamentals Sunny Sharma Microsoft
Business Continuity & Disaster Recovery
Presented by Mike Marty
Microsoft Virtual Academy
Docker Birthday #3.
Reusing old features to build new ones
Lecture 24 Virtual Machine Monitors
Virtualization overview
Group 8 Virtualization of the Cloud
Business Continuity & Disaster Recovery
OS Virtualization.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Virtualization Layer Virtual Hardware Virtual Networking
Virtualization Techniques
If you knew what I know or CloudWave - Improving services in the Cloud through collaborative adaptation Eliot Salant IBM Haifa Research.
A Survey on Virtualization Technologies
Virtual Machines (Introduction to Virtual Machines)
Lecture Topics: 11/1 General Operating System Concepts Processes
VSWAPPER: A Memory Swapper for Virtualized Environments
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Docker Some slides from Martin Meyer Vagrant Box:
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CS295: Modern Systems Virtualization
Interrupt Message Store
Presentation transcript:

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert

Preamble ● HotCloud '11 → VM Retrospection ● Examining historical state of VMs ● Stored in a central VM image library ● Done in collaboration with IBM Research ● Did a summer internship...and...

Introspection vs Retrospection ● Examine active state of VM during execution ● Examine historical state of VMs and their snapshots VM Instance A Examine live logs A'A1A2 B'B1B2... Examine all historic logs A*

Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them?

Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them? Apple's Time Machine over all VM instances including their complete snapshotted history

Example: Forensics - Before Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Pull backups if available 3. Examine differences 4. Determine root causes 5. Redeploy

Example: Forensics - After Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Retrospect entire history 3. Examine differences 4. Determine root causes 5. Redeploy

Introspection Technical Report: CMU-CS What?

Why? Because state-of-the-art snapshotting isn't good enough. (1) High IO overhead (2) High latency to file mutations

file-level mutations createdeletemodify

Goals Low IO Overhead Low Latency Scalability

Goals Low IO Overhead Low Latency Scalability 0 Guest Modifications

1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching storage systems

1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching file systems ● File versioning Simplifying Assumption: Understanding doesn't matter

2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference

2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference Simplifying Assumption: Omniscience doesn't matter

3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency to file mutations ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud

3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency event notification ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud Simplifying Assumption: Performance doesn't matter

What do we want? Performance of storage Semantic understanding of smart disks Virtual expertise of VMI

Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications

Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications Simplifying Assumption: Privacy doesn't matter

File-Level Mutation Stream Δ

(1) Fidelity better than state-of- the-art snapshotting (2) Applications working with snapshotting can work on a stream of file-level mutations

Cloud VMI ● Low Overhead: asynchronous by design ● Low Latency: near-real-time file-level mutation inference ● No guest modifications ● VM instances → near-real-time data sources

We're going full stack...

KVM + QEMU

The Inference Engine

Current Status: Project xray ● Coding C since 2/9/12 ● ext2 file system (read) driver complete ● Static disk analysis complete ● Converting to serialized format underway ● Dynamic virtual disk write stream analysis underway

Integration with Qemu ● Using the tracing system ● Compile-time enabled ● Run-time configured ● Added a single trace event 'bdrv_write' ● Outputs a binary format we interpret struct qemu_bdrv_write_header { int64_t sector_num; int nb_sectors; }__attribute__((packed)); struct qemu_bdrv_write { struct qemu_bdrv_write_header header; const uint8_t* data; };

Demo VM Instance ● Tinycore Linux ● 9 MB Virtual Drive ● 1 ext2 partition ● Full Linux 3.0 Kernel with virtio etc. ● We will first ● Operate on a known write stream capture ● Then, operate on the live running instance

Demo with Questions?