Download presentation
Presentation is loading. Please wait.
Published byTimothy King Modified over 8 years ago
1
Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert
2
Preamble ● HotCloud '11 → VM Retrospection ● Examining historical state of VMs ● Stored in a central VM image library ● Done in collaboration with IBM Research ● Did a summer internship...and...
3
Introspection vs Retrospection ● Examine active state of VM during execution ● Examine historical state of VMs and their snapshots VM Instance A Examine live logs A'A1A2 B'B1B2... Examine all historic logs A*
4
Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them?
5
Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them? Apple's Time Machine over all VM instances including their complete snapshotted history
6
Example: Forensics - Before Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Pull backups if available 3. Examine differences 4. Determine root causes 5. Redeploy
7
Example: Forensics - After Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Retrospect entire history 3. Examine differences 4. Determine root causes 5. Redeploy
8
Introspection Technical Report: CMU-CS-12-103 What?
9
Why? Because state-of-the-art snapshotting isn't good enough. (1) High IO overhead (2) High latency to file mutations
10
file-level mutations createdeletemodify
11
Goals Low IO Overhead Low Latency Scalability
12
Goals Low IO Overhead Low Latency Scalability 0 Guest Modifications
13
1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching storage systems
14
1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching file systems ● File versioning Simplifying Assumption: Understanding doesn't matter
15
2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference
16
2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference Simplifying Assumption: Omniscience doesn't matter
17
3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency to file mutations ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud
18
3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency event notification ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud Simplifying Assumption: Performance doesn't matter
19
What do we want? Performance of storage Semantic understanding of smart disks Virtual expertise of VMI
20
Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications
21
Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications Simplifying Assumption: Privacy doesn't matter
22
File-Level Mutation Stream Δ
23
(1) Fidelity better than state-of- the-art snapshotting (2) Applications working with snapshotting can work on a stream of file-level mutations
24
Cloud VMI ● Low Overhead: asynchronous by design ● Low Latency: near-real-time file-level mutation inference ● No guest modifications ● VM instances → near-real-time data sources
25
We're going full stack...
26
KVM + QEMU
27
The Inference Engine
28
Current Status: Project xray ● Coding C since 2/9/12 ● ext2 file system (read) driver complete ● Static disk analysis complete ● Converting to serialized format underway ● Dynamic virtual disk write stream analysis underway
29
Integration with Qemu ● Using the tracing system ● Compile-time enabled ● Run-time configured ● Added a single trace event 'bdrv_write' ● Outputs a binary format we interpret struct qemu_bdrv_write_header { int64_t sector_num; int nb_sectors; }__attribute__((packed)); struct qemu_bdrv_write { struct qemu_bdrv_write_header header; const uint8_t* data; };
30
Demo VM Instance ● Tinycore Linux ● 9 MB Virtual Drive ● 1 ext2 partition ● Full Linux 3.0 Kernel with virtio etc. ● We will first ● Operate on a known write stream capture ● Then, operate on the live running instance
31
Demo with Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.