Download presentation
Presentation is loading. Please wait.
Published byLesley O’Connor’ Modified over 9 years ago
1
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz
2
Stuff Compilers Don’t (Can’t?) Do Instruction reordering Common case detection and optimization – Branch prediction – Traces ( pre-fetching ) – Optimizing traces Why can’t compilers do these optimizations? – No runtime statistics – Legacy code ( inertia to recompile )
3
Therefore – Dynamic Code Optimization Optimize on the fly ( runtime ) Current processors do it to some extent – Instruction reordering – Branch prediction You can do much better…
4
How Do You Implement This? “Hardware Intensive” approach – Pentium Pro Instruction Translator – Part of the critical path of the main processor – I-COP Instruction-block Optimizer – Off the critical path “Non-Hardware Intensive” approach – Transmeta, DAISY, Java HotSpot Trade-offs ?
5
I-COP (Instruction Path Coprocessors) What? – Add another processor that watches the instructions retire and can perform operations on them Why? – Performance! Principles – Keep the optimizations out of the critical path – Avoid slowdown due to software
6
Structure Multiple VLIW processor “slices” makes the I-COP simple, but still able to keep up I-COP slices have 10 special instructions for pattern matching in addition to 12 normal RISC type
7
Applications of I-COP Trace cache fill – Find long strings of instructions that are executed frequently Pre-fetching – Find a load that is used later as an address in another load Instruction trace optimizations – Register move optimization
8
The I-COP Processor Multiple VLIW slices allow multi-level statically scheduled and explicitly encoded parallelism Predication and delay slots obviate branch prediction 32 integer registers, 8 predicate registers 22 instructions, 12 RISC type, and 10 special – Pattern matching, bit manipulation, instrumentation Fill buffer collects instructions for analysis Task queue acts as FIFO scheduler
9
The I-COP Processor Cont.
10
Examples Of Special Instructions SearchReplace – Finds a given pattern and replaces it with another given pattern, returns the number of replacements accomplished Subset – Tests if the bits set in a given register are a subset of those set in a second register
11
Transmeta Crusoe The best example of a “non-hardware-intensive” approach New (and fast!) 128-bit VLIW processor Aimed at systems where power efficiency is important – Mobile systems – “Dense” servers Therefore, small gate count BUT, need x86 compatibility AND, at reasonable performance too
12
So how do they do it? Have a “Code-Morphing” software layer that runs on the processor All x86 software (BIOS, OS, apps) runs above this CM software translates x86 code at runtime into VLIW processor’s native IS Also optimizes the translations! So processor is fast and simple
13
Cheesy Marketing Image
14
Code-Morphing Software Translates an entire basic-block at once Also does instruction re-ordering, branch prediction, register renaming The translations are stored in a translation cache (part of main memory) Instruments code to help with branch prediction, and detecting candidates for heavy optimizations
15
Code Morphing Software (cont.) Also has some help from the hardware Shadowed and working register sets Alias hardware (load-and-protect operations) “Translated” bit for each page table entry Performance of systems with Crusoe: 2-3 times longer battery life, performance “comparable” to Intel mobile processors
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.