Download presentation
Presentation is loading. Please wait.
Published byAdele Spencer Modified over 9 years ago
1
We are living in a New Virtualized World Sorav Bansal IIT Delhi Feb 26, 2011
2
Old Virtualized World IBM Mainframes (circa 1960) IBM Mainframe VMM OS App
3
New Virtualized World “Cloud-OS” OS App
4
“Cloud-OS” (stuff that you have heard many times before… uh yawn…) Infrastructure Layer (slave) + Management layer (master) Divide hardware into resource pools Unit of abstraction = VM Efficient Effective Isolation Dynamic Fault-Tolerant
5
“Cloud-OS” (more exciting stuff) Dynamic Performance Optimizations – Compiler Optimizations – OS-level Optimizations Providing Determinism – Efficient Para-virtual Record/Replay Improving Reliability – Micro-Replays
6
“Cloud-OS” (more exciting stuff) Security – VMM-level security checks Efficient Thin Clients – Remote Desktopping using VM Record/Replay
7
Performance Optimizations Dynamic Binary Translation (Compiler Optimizations) – Translation Blocks – Direct Jump Chaining – Peephole Optimizations – Trace Optimizations – Exception rollbacks – Interrupt delays
8
Performance Optimizations Dynamic Binary Translation (OS-level Optimizations) – Eliminate traps from system calls – Better TLB/cache locality by using dedicated OS cores
9
Traditional Picture OS Hardware Application 1 Application 2
10
Virtualized Picture OS Application 1 Application 2 Optimizing VMM
11
Lower is Better Some Initial Results
12
Providing Determinism: Record/Replay Uniprocessor – Non-determinism is quite low. Can be efficiently recorded. Multiprocessor – Non-determinism high due to shared memory. – Recording overhead scales poorly with multiple processors – Assuming we can patch the guest in some way, can we improve this situation?
13
Micro-Replays Snapshot Recording non-determinism Hit a Bug (e.g., assertion failure) Execution timeline Replay Choose a rollback point. Also guess bug-inducing non-deterministic choice Potentially bug-free execution
14
Tolerating Non-deterministic Bugs using Record/Replay debit = 0; credit = total; void transfer(void) { for (i = 0; i < 1000; i++) { debit--; credit++; assert(debit + credit == total); } for (t = 0; t < max_threads; t++) { thread_create(transfer); } shared vars unprotected critical section VMM records an execution On assert failure, the VMM interposes and rolls back the execution a few milliseconds VMM guesses the non- deterministic choices that could have caused the failure (e.g., instruction at timer interrupt) VMM replays the execution avoiding the previous non- deterministic choices In this example, VMM infers the critical section after a few runs and avoids interrupting it
15
Number of Replays Required? Technical Report: Micro-Replays: Improving Reliability in Presence of Non-deterministic Software Bugs http://www.cse.iitd.ac.in/~sbansal/pubs/micro_replays.pdf
16
Security Example A Simple Scheme to Prevent Stack-Overflows call ret … push ra, shadow … ra pop ra1 pop shadow if (ra != ra1) error …
17
Remote Desktop Using Streaming VM Record/Replay Typical Remote Desktop
18
Remote Desktop Using Streaming VM Record/Replay Recor d Repla y Remote Desktop using Streaming VM Rec/Rep
19
Bandwidth Comparison Cumulative Data Transfer as function of time
20
Steady-state Bandwidth Comparison Rate (MiB/s) Steady State Bandwidth Requirement
21
Conclusions We are living in a new virtualized world – Many implications in different application areas
22
Backup Slides
23
Translation Blocks Divide code into “translation blocks” – A translation block ends if Reach a control-flow instruction Or, MAX_INSNS instructions have been translated
24
A Simple Scheme Original code fragment Binary Translator x: Translated code fragment tx:
25
Use a Cache Original code fragment Binary Translator x: Translated code fragment tx: Translation Cache Lookup using xsave found not-found
26
Direct Jump Chaining a bc d TaTa TbTb TcTc TdTd lookup(b ) lookup(c) lookup(d)
27
Indirect Jumps a b f call ret TaTa TfTf TbTb lookup(retaddr ) push b jmp T f pop retaddr tmp JTABLE[retaddr & MASK] if (tmp.src == retaddr) goto tmp.dst
28
Lower is Better
30
printf Overheads logarithmic scale
31
Effect of Maximum Size of Translation Block Max Size of Translation Block
32
Effect of Translation Cache Size Number of 4k pages in Translation Cache clock random
33
Optimizations Peephole Optimizations Trace Optimizations Cross-layer optimizations
34
An Example ld M, r1 ld M, r0 mov r0, r1
35
Interrupts ld M, r1 ld M, r0 mov r0, r1 Delay Interrupt delivery till end of current translation
36
Precise Exceptions retld (sp),t0 add $4, sp … jmp t0 Page fault sub $4, sp restore t0 rollback code page fault handler
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.