Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 2004 J. E. Smith Virtual Machines: An Architecture Perspective.

Similar presentations


Presentation on theme: "November 2004 J. E. Smith Virtual Machines: An Architecture Perspective."— Presentation transcript:

1 November 2004 J. E. Smith Virtual Machines: An Architecture Perspective

2 VMs (c) 2004, J. E. Smith 2 Introduction Why are virtual machines interesting? They involve computer architecture in a pure sense They allow transcending of interfaces (which often seem to be an obstacle to innovation) They enable innovation in flexible, adaptive hardware, security, fault-tolerance, support for network computing (and others)

3 VMs (c) 2004, J. E. Smith 3 Performance Isn’t Everything  The BIG ideas are all at least 20 years old and they have been very thoroughly explored  Focus research on other important areas Power efficiency Performance efficiency Security Ease of design Software compatibility / interoperability  Virtual Machines can be important enablers for all the above

4 VMs (c) 2004, J. E. Smith 4 Outline  Virtualization  The Family of Virtual Machines  Process VMs and Code Caching  High Level Language VMs  Co-Designed VMs  Research in Co-Designed VMs

5 VMs (c) 2004, J. E. Smith 5 Abstraction  Computer systems are built on levels of abstraction  Instruction Set Architecture Major division between hardware and software I/O devices and Networking Controllers System Interconnect (bus) Controllers Memory Translation Execution Hardware Drivers Memory Manager Scheduler Operating System Libraries Application Programs Main Memory 1 2 33 456 778888 9 10 11 12 1314 Software Hardware  Application Binary Interface Observed by user processes User ISA + OS calls  Higher level of abstraction hide details at lower levels  Example: files are an abstraction of a disk file abstraction

6 VMs (c) 2004, J. E. Smith 6 Virtualization  An isomorphism from guest to host Map guest state to host state Implement “equivalent” functions S i S S i ' S j ' Guest Host V(S i ) S j ) e(S i ) e'(S i ') j

7 VMs (c) 2004, J. E. Smith 7 Virtualization  Similar to abstraction Except Details not necessarily hidden  Construct Virtual Disks As files on a larger disk Map state Implement functions  Now do the same thing with the whole “machine” file virtualization

8 VMs (c) 2004, J. E. Smith 8 The Family of Virtual Machines  Lots of things are called “virtual machines” IBM VM/370 Java VMware Some things not called “virtual machines”, are virtual machines IA-32 EL Dynamo Transmeta Crusoe

9 VMs (c) 2004, J. E. Smith 9 System Virtual Machines  Provide a system environment  Constructed at ISA level  Persistent  Examples: IBM VM/360, VMware, Transmeta Crusoe guest process HOST PLATFORM virtual network communication Guest OS VMM guest process guest process guest process Guest OS2 VMM guest process guest process

10 VMs (c) 2004, J. E. Smith 10 System Virtual Machines  Native VM System VMM privileged mode Guest OS user mode Example: classic IBM VMs  User-mode Hosted VM VMM runs as user application  Dual-mode Hosted VM Parts of VMM privileged, parts non-privileged Example VMware Non-privileged modes Privileged Mode Virtual Machine VMM Hardware Virtual Machine Host OS Hardware VMM Virtual Machine Host OS Hardware VMM

11 VMs (c) 2004, J. E. Smith 11 Process Virtual Machines  Constructed at ABI level  Runtime manages guest process  Guest processes may intermingle with host processes  Not persistent  As a practical matter, guest and host OSes are often the same  Dynamic optimizers are a special case  Examples: IA-32 EL, FX!32, Dynamo HOST OS Disk file sharing network communication guest process create host process guest process runtime guest process runtime host process

12 VMs (c) 2004, J. E. Smith 12 The Virtual Machine Space Multi programmed Systems HLL VMs Co-Designed VMs same ISA different ISA Process VMsSystem VMs Whole System VMs different ISA same ISA Classic OS VMs Dynamic Binary Optimizers Dynamic Translators Hosted VMs

13 VMs (c) 2004, J. E. Smith 13 Architecture Issues: System VMs  Why System VMs are of interest today Security & Fault Tolerance (isolation) Platform Consolidation Application/Environment portability  “Efficiently Virtualizable” Instruction Sets Goldberg and Popek (1974) should still be required reading (An architecture paper with theorems and proofs!)  Virtual Machine Assists Compensate for inefficiencies due to privilege level “compression” Fast emulation of system functions Many developed for IBM mainframe VMs

14 VMs (c) 2004, J. E. Smith 14 System Virtualization  Traps and interrupts (& sys calls) Transfer to VMM VMM determines appropriate Guest OS VMM transfers to Guest OS  Guest performs privileged operation Trap to VMM VMM reads/modifies guest state May modify shadow state Returns to Guest  Guest OS “return” to user app. Transfer to VMM VMM bounces return back to Guest app. privileged operation next instruction check privileges perform operation return system call/trap vector location: virtual vector location: Application Guest OS VMM

15 VMs (c) 2004, J. E. Smith 15 Popek and Goldberg (in brief)  Control Sensitive instructions All instructions that change hardware resource allocation (or mapping) Example: write TLB  Behavior Sensitive instructions All instructions whose outcome depends on hardware resource allocation Example: read processor mode  Theorem (paraphrase) Efficiently virtualizable if all sensitive instructions trap in user mode

16 VMs (c) 2004, J. E. Smith 16 System VM Research  Architecture Challenge: Make IA-32 efficiently virtualizable  Virtual Machine Assists Compensate for inefficiencies due to privilege level “compression” Fast emulation of system functions Many developed for IBM mainframe VMs  Applications to Chip Multiprocessors Technology changes often require innovation and “re-invention”

17 VMs (c) 2004, J. E. Smith 17 The Virtual Machine Space Multi programmed Systems HLL VMs Co-Designed VMs same ISA different ISA Process VMsSystem VMs Whole System VMs different ISA same ISA Classic OS VMs Dynamic Binary Optimizers Dynamic Translators Hosted VMs

18 VMs (c) 2004, J. E. Smith 18 Architecture Issues: Process VMs  Generally to allow application migration Or to run popular software on a less popular platform Goal is generally to minimize performance loss  Same-ISA dynamic optimizers are special case HP Dynamo  Architecture problems Efficient code-caching Indirect jump problem Protecting runtime from guest process

19 VMs (c) 2004, J. E. Smith 19 Staged Emulation with Code Caching  An important part of many VM implementations  Translate, optimize & cache frequent code sequences Binary Memory Image Code Cache Profile Data Interpreter Translator/ Optimizer runtime  Start interpreting  Profile to find “hot” code regions

20 VMs (c) 2004, J. E. Smith 20 Superblocks  Based on “hot” paths  One entry multiple exits  May contain redundant blocks (tail duplication) 15 BD C G A EF BD C G A EF GG

21 VMs (c) 2004, J. E. Smith 21 Binary Translation Example 4FD0:addl%edx,(%eax);load and accumulate sum movl(%eax),%edx;store to memory sub%ebx,1;decrement loop count jz51C8;branch if at loop end 4FDC:add%eax,4;increment %eax jmp4FD0;jump to loop top 51C8:movl(%ecx),%edx;store last value of %edx xorl%edx,%edx;clear %edx jmp6200;jump elsewhere x86 Binary 9AC0:lwzr16,0(r4);load value from memory addr7,r7,r16;accumulate sum stw0(r5),r7;store to memory subi.r5,r5,1;decrement loop count, set cr0 bezcr0,pc+12;branch if loop exit blF000;branch & link to EM 4FDC;save source PC in link register 9AE4:blF000;branch & link to EM 51C8;save source PC in link register 9C08:stw0(r6),r7;store last value of %edx subir7,r7,r7;clear %edx blF000;branch & link to EM 6200;save source PC in link register PowerPC Translation

22 VMs (c) 2004, J. E. Smith 22 Code Caches  Contain Basic blocks Superblocks (one entrance, multiple exits) Optimized Superblocks  A base technology for many VMs Dynamic binary translators: Intel IA-32 EL, Compaq FX!32 Dynamic binary optimizers: Dynamo family Co-designed virtual machines: Transmeta, IBM DAISY High performance Java virtual machines System VMs with “inefficiently virtualizable” ISAs “Sandboxing” secure VMs (x86 DynamoRIO)

23 VMs (c) 2004, J. E. Smith 23 Indirect Jumps  Translated code cache PC (TPC) differs from Source binary PC (SPC) Need branch/jump target address translation (Direct) branches are easier; target address is fixed  Chaining can be used Super block Dispatch table lookup code Super block Without chaining Super block Dispatch table lookup code Super block With chaining Super block

24 VMs (c) 2004, J. E. Smith 24 The Indirect Jump Problem  Target addresses (SPCs) can change SPC needs to be translated at run-time, not translation time  Conventional solution: superblock construction-time software prediction (aka inline caching) If Rx == #addr_1 goto #target_1 Else if Rx == #addr_2 goto #target_2 Else dispatch_table_lookup(Rx); do it the slow way The biggest overhead in code caches –Compare-and-branch: 6 instructions –Hash table lookup: 15 instructions in Dynamo x86

25 VMs (c) 2004, J. E. Smith 25 Protecting the Runtime  The runtime shares process memory space with application Must protect runtime from application Expensive memory protection changes on switches between runtime and code cache If guest registers are mapped to host memory How are memory mapped registers protected? Guest Code Guest Data Runtime Data Runtime Code N R/W Code Cache Ex R/W N Guest Code Guest Data Runtime Data Runtime Code N N Code Cache N Ex N R/W R Runtime modeEmulation mode

26 VMs (c) 2004, J. E. Smith 26 Process VM Research  Same-ISA dynamic binary optimizers are probably not a winning proposition Indirect jumps lead to performance losses on modern processors (optimizers with patching are better) Complete (intrinsic) compatibility is extremely difficult May have to rely on extrinsic assurances Topic of architecture research similar to Goldberg and Popek  For general process VMs some primitive support in ISA will be useful / necessary Indirect jumps (more later) Code caching Protection

27 VMs (c) 2004, J. E. Smith 27 Computer Architecture Innovation HLL VMs – software people invent ISA to solve SW problems Co-Designed VMs – hardware people invent ISA to solve HW problems These two are the most interesting VMs from an architecture perspective and provide the biggest opportunities.

28 VMs (c) 2004, J. E. Smith 28 The Virtual Machine Space Multi programmed Systems HLL VMs Co-Designed VMs same ISA different ISA Process VMsSystem VMs Whole System VMs different ISA same ISA Classic OS VMs Dynamic Binary Optimizers Dynamic Translators Hosted VMs

29 VMs (c) 2004, J. E. Smith 29 High Level Language Virtual Machines  Raise the “ABI” level of abstraction User higher level virtual ISA OS abstracted as standard libraries  A form of process VM HLL Program Intermediate Code Memory Image Object Code ( ISA ) Compiler front-end Compiler back-end Loader HLL Program Portable Code ( Virtual ISA ) Host Instructions Virt. Mem. Image Compiler VM loader VM Interpreter/Translator Traditional HLL VM

30 VMs (c) 2004, J. E. Smith 30 Architecture Issues: High Level VMs  Examples: Sun Java Microsoft.NET Framework and MSIL  Why are HLL VMs important? Microsoft says so. It’s a good idea. Combines object oriented programming and network computing

31 VMs (c) 2004, J. E. Smith 31 HLL VMs: Architecture Perspective  Here, architects were deprived (or let themselves be deprived) of some interesting architecture work  Don’t look at it bottom-up, i.e. Take existing software for supporting HLL VMs, Generate traces for standard ISAs, Analyze traces Conclude its “just like C”… problem solved!  Look top-down – start with features of MSIL and look for computer architecture opportunities Will require a mix of hardware and software innovation (else just continue to ignore real architecture in favor of implementation)

32 VMs (c) 2004, J. E. Smith 32 HLL VM Research  Metadata – an interesting concept Data Set Architecture Don’t have to discover data structures – compare with C programs. Metadata Code Machine Independent Program File Loader Virtual Machine Implementation Interpreter Internal Data Structures Translator Native Code

33 VMs (c) 2004, J. E. Smith 33 HLL VM Research  Precise trap model Problems in conventional processors: All state precise Many instructions can trap Enable/disable “remote” and at any time HLL VMs Not all state must be precise PC not needed operand stack never local variables only if trap is handled locally Trap enable explicit and locally specified

34 VMs (c) 2004, J. E. Smith 34 HLL VM Research  Stack tracking At any given point, operand stack must have same number of elements and types regardless of control flow path This property could simplify exploitation of control independence

35 VMs (c) 2004, J. E. Smith 35 HLL VMs Summary  Claim: Slow-downs due to OO programming, probably not dynamic compilation – and not stack-based ISA  Research opportunities abound For VM implementation For speeding up OO programs (look beyond C/C++) Use co-designed HW/SW Base design on MSIL/Java and implement conventional ISA as the uncommon case

36 VMs (c) 2004, J. E. Smith 36 The Virtual Machine Space Multi programmed Systems HLL VMs Co-Designed VMs same ISA different ISA Process VMsSystem VMs Whole System VMs different ISA same ISA Classic OS VMs Dynamic Binary Optimizers Dynamic Translators Hosted VMs

37 VMs (c) 2004, J. E. Smith 37 Co-Designed Virtual Machines  Separate the hardware/software interface from the ISA level of abstraction  Restore the ISA to its “natural” place  as an I mplementation ISA that reflects actual hardware  Support existing ISAs  as a Virtual ISA  Let processor designers use both hardware and software  A form of system VM OS libs. User Applications V-ISA I-ISA Hardware Software Hardware OS libs. User Applications ISA

38 VMs (c) 2004, J. E. Smith 38 Co-Designed VMs  Should be of interest to both architects and micro-architects Offers opportunities for performance, power saving, fault tolerance and other implementation- dependent features Allows transcending conventional ISAs Don’t confuse them with VLIW!

39 VMs (c) 2004, J. E. Smith 39 Architecture Issues: Concealed Memory  VM software resides in memory concealed from all conventional software Source ISA Data Code Cache VM Code ICache Hierarchy DCache Hierarchy Processor Core Source ISA Code VM Data concealed memory conventional memory

40 VMs (c) 2004, J. E. Smith 40 Another Way of Doing Things conventional dynamic translation Code Cache Processor Pipeline Software Translator Main Memory Func. Unit Func. Unit... Main Memory Cache Hierarchy Processor Pipeline Translation Unit (form uops) Func. Unit Func. Unit Func. Unit... Translation Unit (form uops) Cache Hierarchy

41 VMs (c) 2004, J. E. Smith 41 Jump Target-address Lookup Table  A hardware cache of dispatch table entries  Similar to software-managed TLB in virtual memory Jump insn TPC BTB Predicted next fetch TPC Tag TPC Jump insn Register identifier SPC Register file Jump Target SPC SPC TPC JTLT Jump Target TPC Hit? Match? Yes BTB prediction correct Yes No BTB misprediction: Redirect fetch to jump target TPC from JTLT No JTLT miss: Redirect fetch to the dispatch code

42 VMs (c) 2004, J. E. Smith 42 SPC TPC Push-dual- address-RAS insn Dual-address RAS  Problem: function call instruction saves return SPC not TPC Conventional software-based chaining cannot utilize a RAS  Solution: save both SPC and TPC Dual-address RAS SPC TPC JTLT

43 VMs (c) 2004, J. E. Smith 43 IPC performance  “Translate” Alpha to Alpha; start with highly optimized code  Conventional method (ala Dynamo) results in 14% IPC loss  Dual-address RAS provides the most benefit  Using both JTLT & RAS, 7.7% IPC improvement Due to superblock re-layout

44 VMs (c) 2004, J. E. Smith 44  Wide pipelines are at odds with fast pipelines Fast pipeline => low complexity per stage More instructions per stage => high complexity per stage  Process larger atomic units in pipeline stages  Narrower “effective” width  Reduce decoding stages Do more in software  Pipeline the issue stage Research: Efficient Microarchitectures

45 VMs (c) 2004, J. E. Smith 45 Fused Instruction Set  Co-designed VM x86 implementation Shorten and simplify pipeline front-end  Combine pairs of dependent instructions For single “unit” for pipeline processing  Use VM software to “Crack” x86 instructions into RISC-ops Re-order RISC-ops Reassemble into (new) fused pairs  Related: Pentium-M fuses in front-end Using original x86 instructions

46 VMs (c) 2004, J. E. Smith 46 Conventional Issue Logic  Select and issue instructions free of data dependences  Based on the selection, clear dependences And “wake-up” newly independent instructions  Single cycle select-wakeup important for good performance OPR1Imm.R2 OPR6R7R1 Issue Buffer select fanout/ wakeup

47 VMs (c) 2004, J. E. Smith 47  Fuse dependent instructions into single slot  Fused instructions traverse entire pipeline  Make single issue decision for the pair Pipelined Issue Logic

48 VMs (c) 2004, J. E. Smith 48 Instruction Set

49 VMs (c) 2004, J. E. Smith 49 Translation Algorithm Two Pass Algorithm: 1. Form superblocks using Dynamo MRET method 2. Crack x86 instructions into RISC-like micro-ops 3. Attempt to fuse ALU ops only 4. Fuse LD/ST instructions as tails and ALU ops as heads

50 VMs (c) 2004, J. E. Smith 50 Fusing Profile  About 50% of operations are fused  Only 5-10% of non-fused are single-cycle ALU ops

51 VMs (c) 2004, J. E. Smith 51 Distance Between Fused Operations  Most fused operations close together 70% of fused ops from different x86 instructions 60% contain two ALU operations

52 VMs (c) 2004, J. E. Smith 52 Performance (Normalized IPC)  Baseline: generic superscalar  Macro-op: Fused macro-ops with pipelined issue logic  Baseline Pipelined: superscalar with pipelined issue logic 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 16243240485664 Issue Window Size Relative IPC performance 4-wide Macro-op 4-wide Baseline 4-wide Baseline Pipelined 2-wide Macro-op

53 VMs (c) 2004, J. E. Smith 53 VM Research  Architecture Support for VMs Enable spectrum of VMs (process, system, HLL, co-designed) Support for dynamic translation and optimization Primitives: code caches & indirect jumps; concealed memory Pays for itself – helps get rid of obsolete ISA baggage  VM applications Security Fault Tolerance  Co-Designed VMs Efficient microarchitecture Adaptive microarchitecture For power efficiency For performance  New ISAs Application-area specific ISAs Support for Java/MSIL “Convergence” architectures  Computer Architects can do Computer Architecture!


Download ppt "November 2004 J. E. Smith Virtual Machines: An Architecture Perspective."

Similar presentations


Ads by Google