QEMU, a Fast and Portable Dynamic Translator Fabrice Bellard (affiliation?) CMSC 691 talk by Charles Nicholas
Overview QEMU is a system for creating emulators, i.e. programs that allow programs for Target machine A to run on Host machine B For example, to allow a MIPS program (an OS plus apps) to run on an i386 or x64 host By pre-compiling micro operations that make up guest machine instructions, QEMU gains speed over other approaches
What’s the Problem? We want to run a binary for CPU A, but we only have machine B, so we need to emulate A Testing a cross-compiler which has A as a target We need to keep a lower profile than VirtualBox
QEMU Has Many Parts DYNGEN – “performs runtime conversion of the target CPU instructions into the host instruction set” – focus of this paper debugger UNIX user mode emulated devices (many more may exist now) and a GUI!
Example: PowerPC to x86 PowerPC instruction addi r1, r1, -16# r1=r1-16 Generated micro operations mov1_T0_r1# T0=r1 add1_T0_im -16# T0=T0-16 mov1_r1_T0# r1=T0 These micro ops are implemented in C, as follows...
Implement Each Micro Op To implement the micro op mov1_T0_r1# T0=r1 A short C function is written void op_mov1_T0_r1(void){ T0 = env->regs[1]; } The various target CPU registers are part of the environment data structure, and T0 is a variable
How Are These Functions Used? The various micro op functions are compiled, and object code stored When a target instruction is first found, the micro op object code sequences are concatenated in memory, and cached System calls such as memcpy are used to handle target machine memory operations Each target machine opcode needs to be handled (one would expect)
How Does DYNGEN Work? The compiled C code (somefile.obj) is parsed, and the location of various micro ops noted. The number of parameters of each micro op is noted, too. GCC flags are used to simplify the parsing, and to assure that the micro op code is position- independent. (Do you know what that means? Am I right on this point?)
Implementation Aspects Target operations are grouped into Translated Blocks, delimited by branching ops Translated Blocks are cached (to avoid translating the same code repeatedly) Target CPU registers (including condition code flags) are just C variables The Translated Blocks may have extra code to branch to each other, if that’s what would happen in target code. (Direct Block Chaining)
Other Issues Memory Management Self-modifying code (which malware may use!) Exceptions Interrupts
Evaluation The BYTEmark benchmark for Linux was run on a native host, and then on a QEMU virtual machine. – integer code 4x – floating point code 10x – faster than Bochs by 30x – faster than valgrind by 1.2x, even though valgrind is hand coded for x86 to x86 emulation
Conclusions It’s possible to make a versatile emulator with decent performance – versatile: many (target, host) combinations Lots of options regarding debugging or instrumenting code There seems to be an active QEMU community For more info, see