SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop.

Slides:



Advertisements
Similar presentations
Adding the Easy Button to the Cloud with SnowFlock and MPI Philip Patchin, H. Andrés Lagar-Cavilla, Eyal de Lara, Michael Brudno University of Toronto.
Advertisements

KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
H. Andrés Lagar-Cavilla Joe Whitney, Adin Scannell, Steve Rumble, Philip Patchin, Charlotte Lin, Eyal de Lara, Mike Brudno, M. Satyanarayanan* University.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Processes CSCI 444/544 Operating Systems Fall 2008.
Eyal de Lara Department of Computer Science University of Toronto.
Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.
Eyal de Lara Department of Computer Science University of Toronto.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人:張逸文.
Improving Network I/O Virtualization for Cloud Computing.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Computer Studies (AL) Memory Management Virtual Memory I.
Dynamic and Secure Application Consolidation with Nested Virtualization and Library OS in Cloud Kouta Sannomiya and Kenichi Kourai (Kyushu Institute of.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Full and Para Virtualization
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Threads. Readings r Silberschatz et al : Chapter 4.
EECS 262a Advanced Topics in Computer Systems Lecture 20 VM Migration/VM Cloning November 13 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical.
1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Lecture 5 Page 1 CS 111 Online Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
CS161 – Design and Architecture of Computer
Virtualization.
CS 540 Database Management Systems
Introduction to Kernel
Processes and threads.
Presented by Yoon-Soo Lee
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Chapter 9: Virtual Memory
Chapter 1: Introduction
Lecture 4: Operating System Structures
Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command Some by invocation from other running.
CSC 322 Operating Systems Concepts Lecture - 16: by
KERNEL ARCHITECTURE.
CIT 480: Securing Computer Systems
Introduction to Operating Systems
OS Virtualization.
Chapter 9: Virtual-Memory Management
Threads and Data Sharing
John Kubiatowicz Electrical Engineering and Computer Sciences
Process & its States Lecture 5.
Process Description and Control
Lecture Topics: 11/1 General Operating System Concepts Processes
Process Migration Troy Cogburn and Gilbert Podell-Blume
Multithreaded Programming
Chapter 2: Operating-System Structures
Introduction to Operating Systems
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Prof. Leonardo Mostarda University of Camerino
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Concurrency: Processes CSE 333 Summer 2018
CS510 Operating System Foundations
Virtual Memory: Working Sets
Chapter 2: Operating-System Structures
Xen and the Art of Virtualization
Chapter 1: Introduction CSS503 Systems Programming
System Virtualization
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

SnowFlock: Virtual Machine Cloning as a First- Class Cloud Primitive Lagar-Cavilla, et al. Reading Group Presentation 15 December 2011 Adriaan Middelkoop

2 Article ● Published: EuroSys '09, ACM Transactions on Computer Systems ● Authors: ● H. Andrés Lagar-Cavilla - AT&T ● Joseph A. Whitney – University of Toronto ● Roy Bryant, Philip Patchin, Michael Brudno, Eyal de Lara, Stephen M. Rumble – Standford University ● M. Satyanarayanan – Carnegie Mellon University ● Adin Scannell – GridCentric Inc.

3 Motivation ● Starting up a virtual machine is expensive ● initialize virtual hardware ● startup kernel / create kernel tables ● startup and initialize applications ● clone a VM instead? ● No clear API for creating VM instances ● some semi-automatic configuration ● startup scripts ● create VMs programmatically like processes?

4 What is SnowFlake

5 In a Nutshell ● Spawning VM instances ● Traditional: boot guest, run startup script ● Article: uses fork() on VMs in the application: ● Contributions of Snowflake ● Fast spawning of instances ● Only the active working set is transferred to the spawned instance + more bandwidth-saving tricks ● Works with conventional hardware but requires changes to VM and Guest/Host OS – file system driver, network driver, memory manager :(){ :|:&};:

6 General Idea ● Parent-child parallelism using process cloning ● Child starts from a clone of the state of the parent process ● Different value in a register to distinguish the child from the parent to execute different code long pid = fork(); if (pid == 0) { // child code } else { // parent code } Efficient with copy-on-write and page sharing Simple API, but: ● too little isolation: only isolation of the process' memory ● too much isolation: no direct support for exchange of results

7 General Idea ● Article presents: parallelism through VM cloning ● Child starts in a clone of the parent VM ● Different value in a register to distinguish the child from the parent to execute different code long vid = vm_fork(); if (vid == 0) { // child code } else { // parent code } Similar optimizations as with fork to efficiently clone the VM independent copy of memory, operating system, and disk

8 API (Table 1) ● Ticket sf_request(int n, bool same_node) ● Id sf_clone(Ticket t) ● void sf_exit() ● void sf_join(Ticket t) ● void sf_kill(Ticket t) ● CheckPoint sf_checkpoint_parent() ● Id sf_create_clones(CheckPoint c, Ticket t)

9 Caveats ● A VM may run multiple processes, which could all run a VM fork ● Guideline: one main process per VM ● Use separate VMs for different kind of processes ● Parent and child cannot communicate directly ● Child receives a copy of memory and disk of parent ● Use sockets or files on a network disk ● Why not use shared pipes?

10 Typical Fork Exploitation ● Sandboxing (Figure 1a) ● Load handling (Figure 1c) ● Parallel task pool (Figure 1b, Figure 1d) Requires small overhead Requires low latency

11 Achievements ● Snowflake replication: 0.8s clock time ● Independent of the number of clones ● if each clone gets its own physical node ● if multicast is used ● Conventional replication (Figure 2) ● 90s with multicast ● * n without multicast ● not only 100x slower, but also: ● too high latency

12 Implementation

13 Four Insights ● Children can already resume execution with initially a small replicated state ● Children access only a bit of the parent's memory ● Children allocate memory after forking ● overwritten without accessing the original contents ● Children execute similar code and use common data structures only replicate 0.1% of the state: low latency memory on demand: don't replicate all state swap-files are based on a similar insight don't fetch pages that are allocated by the child use multicast double-edged sword latency vs caching/prefetching

14 VM Descriptors ● Minimal description of the VM in order to recreate it “on-demand” (approx 1 MB) ● Not the full state of the VM, instead: ● virtual CPU registers ● page tables (= biggest part) ● segmentation tables ● device specs ● some special memory pages

15 Clone creation ● Parent: SnowFlake save (100 ms) ● Parent: Xen save (100 ms) ● Start clones (10 ms) ● Multicast descriptors (100 ms) ● Child: SnowFlake restore (200 ms) ● Child: Xen restore (200 ms) Snowflock, Xen and linux guest and linux host Benchmarks on a 32 node, 4core 3.2 Ghz Xeon cluster with 4 GB Ram per node

16 Memory on Demand ● Parent: copy on write ● Child: on demand with avoidance heuristics ● don't request pages that are allocated by the child ● don't request pages that are written to by an I/O device of the child ● Benchmarks of on-demand memory (Figure 4a): ● page fetch 275 microseconds ● 85% of the time in the network ● Benchmarks of heuristics (Figure 4b): ● 40x reduction in page requests ● unicast: 4x faster with heuristics ● multicast: 2x faster with heuristics (and slightly faster than unicast) ● Benchmarks of multi-cast (Figure 4c): ● scales when a significant portion of the parent's state is needed

17 Virtual Disk ● Copy-on-write implementation ● Lazy fetching of blocks ● Similar heuristics as with memory: don't fetch blocks that are overwritten by a child a disk is less volatile than main memory, so it may be worthwhile to cache fetched blocks spawned processes usually perform only little I/O

18 Conclusions

19 Benchmarks ● See Section 5 ● Comparison versus a zero-cost fork, which are pre-cloned VMs waiting for only the job data. The difference in speedup and total time are within 5% ● NCBI Blast – DNA queries ● SHRiMP – DNA queries (more memory intensive) ● ClustalW – More DNA queries (more cpu- instensive, highly parallel) ● QuantLib – Quantative finance program ● Aqsis – Renderer of animation movies ● Distcc – Distributed make for C programs

20 Discussion Items ● For what applications would 800ms be a too high latency? ● Should operating systems not offer to run all processes in their own virtual machine? Thus, is using a fully-fledged guest-OS a hack to overcome a deficiency in current OS implementations? ● Processes need to communicate: they cannot be fully isolated from each other. => What are proper synchronization primitives? ● What to do with shared memory, shared files? ● Transactions? ● Paper claims seamless integration with MPI => Why not task pool? ● What about applications that use garbage collection?