We are living in a New Virtualized World Sorav Bansal IIT Delhi Feb 26, 2011
Old Virtualized World IBM Mainframes (circa 1960) IBM Mainframe VMM OS App
New Virtualized World “Cloud-OS” OS App
“Cloud-OS” (stuff that you have heard many times before… uh yawn…) Infrastructure Layer (slave) + Management layer (master) Divide hardware into resource pools Unit of abstraction = VM Efficient Effective Isolation Dynamic Fault-Tolerant
“Cloud-OS” (more exciting stuff) Dynamic Performance Optimizations – Compiler Optimizations – OS-level Optimizations Providing Determinism – Efficient Para-virtual Record/Replay Improving Reliability – Micro-Replays
“Cloud-OS” (more exciting stuff) Security – VMM-level security checks Efficient Thin Clients – Remote Desktopping using VM Record/Replay
Performance Optimizations Dynamic Binary Translation (Compiler Optimizations) – Translation Blocks – Direct Jump Chaining – Peephole Optimizations – Trace Optimizations – Exception rollbacks – Interrupt delays
Performance Optimizations Dynamic Binary Translation (OS-level Optimizations) – Eliminate traps from system calls – Better TLB/cache locality by using dedicated OS cores
Traditional Picture OS Hardware Application 1 Application 2
Virtualized Picture OS Application 1 Application 2 Optimizing VMM
Lower is Better Some Initial Results
Providing Determinism: Record/Replay Uniprocessor – Non-determinism is quite low. Can be efficiently recorded. Multiprocessor – Non-determinism high due to shared memory. – Recording overhead scales poorly with multiple processors – Assuming we can patch the guest in some way, can we improve this situation?
Micro-Replays Snapshot Recording non-determinism Hit a Bug (e.g., assertion failure) Execution timeline Replay Choose a rollback point. Also guess bug-inducing non-deterministic choice Potentially bug-free execution
Tolerating Non-deterministic Bugs using Record/Replay debit = 0; credit = total; void transfer(void) { for (i = 0; i < 1000; i++) { debit--; credit++; assert(debit + credit == total); } for (t = 0; t < max_threads; t++) { thread_create(transfer); } shared vars unprotected critical section VMM records an execution On assert failure, the VMM interposes and rolls back the execution a few milliseconds VMM guesses the non- deterministic choices that could have caused the failure (e.g., instruction at timer interrupt) VMM replays the execution avoiding the previous non- deterministic choices In this example, VMM infers the critical section after a few runs and avoids interrupting it
Number of Replays Required? Technical Report: Micro-Replays: Improving Reliability in Presence of Non-deterministic Software Bugs
Security Example A Simple Scheme to Prevent Stack-Overflows call ret … push ra, shadow … ra pop ra1 pop shadow if (ra != ra1) error …
Remote Desktop Using Streaming VM Record/Replay Typical Remote Desktop
Remote Desktop Using Streaming VM Record/Replay Recor d Repla y Remote Desktop using Streaming VM Rec/Rep
Bandwidth Comparison Cumulative Data Transfer as function of time
Steady-state Bandwidth Comparison Rate (MiB/s) Steady State Bandwidth Requirement
Conclusions We are living in a new virtualized world – Many implications in different application areas
Backup Slides
Translation Blocks Divide code into “translation blocks” – A translation block ends if Reach a control-flow instruction Or, MAX_INSNS instructions have been translated
A Simple Scheme Original code fragment Binary Translator x: Translated code fragment tx:
Use a Cache Original code fragment Binary Translator x: Translated code fragment tx: Translation Cache Lookup using xsave found not-found
Direct Jump Chaining a bc d TaTa TbTb TcTc TdTd lookup(b ) lookup(c) lookup(d)
Indirect Jumps a b f call ret TaTa TfTf TbTb lookup(retaddr ) push b jmp T f pop retaddr tmp JTABLE[retaddr & MASK] if (tmp.src == retaddr) goto tmp.dst
Lower is Better
printf Overheads logarithmic scale
Effect of Maximum Size of Translation Block Max Size of Translation Block
Effect of Translation Cache Size Number of 4k pages in Translation Cache clock random
Optimizations Peephole Optimizations Trace Optimizations Cross-layer optimizations
An Example ld M, r1 ld M, r0 mov r0, r1
Interrupts ld M, r1 ld M, r0 mov r0, r1 Delay Interrupt delivery till end of current translation
Precise Exceptions retld (sp),t0 add $4, sp … jmp t0 Page fault sub $4, sp restore t0 rollback code page fault handler