Download presentation
Presentation is loading. Please wait.
Published byCarly Whitling Modified over 9 years ago
1
Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota
2
Compiler Optimization: The phases of compilation that generates good code to make as efficiently use of the target machines as possible. Static Optimization: Compile time optimization – one time, fixed optimization that will not change after distribution. Dynamic Optimization: Optimization performed at program execution time – adaptive to the execution environment. Background
3
Instruction scheduling Cache prefetching Examples of Compiler Optimizations Ld R1,(R2) Add R3,R1,R4 Ld R5,(R6) Add R7,R5,R4 Ld R1,(R2) Ld R5,(R6) Add R3,R1,R4 Add R7,R5,R4 Ld R1,(R2) Addi R2,R2,64 Add R3,R1,R4 Ld R1,(R2) prefetch 256(R2) Addi R2,R2,64 Add R3,R1,R4 Frequent data cache misses !!
4
In the last 15 years, the computer performance has increased by ~1000 times. Clock rate increased by ~100 X Micro-architecture contributed ~5X ( the number of transistors doubles every 18 months) Compiler optimization added ~2-3X for single processors ( some overlap between clock rate and micro- architectures, and some overlap between micro- architecture and compiler optimizations) Is Compiler Optimization Important ?
5
Speed up from Compiler Optimization
7
Excellent Benchmark Performance
8
Mediocre Application Performance Many application binaries not optimized by compilers. ISV releases one binary for all machines in the same architecture (e.g. P5), but the binary may not run efficiently on the user’s machine (e.g. P6). ISV might have optimized code with some profiles exercising different parts of the application than what is actually executed. Application is built from many shared libraries, but no cross-library optimizations. Performance not effectively delivered for end-users!!
9
Instruction scheduling Cache prefetching Examples of Compiler Optimizations Ld R1,(R2) Add R3,R1,R4 Ld R5,(R6) Add R7,R5,R4 Ld R1,(R2) Ld R5,(R6) Add R3,R1,R4 Add R7,R5,R4 Ld R1,(R2) Addi R2,R2,64 Add R3,R1,R4 Ld R1,(R2) prefetch 256(R2) Addi R2,R2,64 Add R3,R1,R4 What if the load latency is 4 clocks instead of 2? Does the compiler know where are data cache misses?
10
Execution environment can be quite different from the assumption made at compile time. Code should be optimized for the machine it runs on Code should be optimized by how the code is used Code should be optimized when all executables are available Code should be optimized only the part that matters A Case for Dynamic Optimization
11
ADORE ADaptive Object code RE-optimization The goal of ADORE is to create a system that transparently finds and optimizes performance critical code at runtime. –Adapting to new micro-architectures –Adapting to different user environments –Adapting to dynamic program behavior –Optimizing shared library calls A prototype ADORE has been implemented on the Itanium/Linux platform.
12
Framework of ADORE Main Program Optimized Trace Pool Main Thread Trace Selector Optimizer Patcher Phase Detector User Event Buffer (UEB) DynOpt Thread Kernel Space System Sample Buffer (SSB)
13
Current Optimizations in ADORE We have implemented –Data cache prefetching –Trace selection and layout We are investigating and testing the following optimizations –Instruction scheduling with control and data speculation –Instruction cache prefetching –Partial dead code elimination
14
Performance Impact of O2/O3 Binary
16
Optimizing BLAST with ADORE BLAST is the most popular tool used in bioinformatics. Several faculty members and research colleagues are using it. Used as a benchmark by companies to test their latest systems and processors The performance of BLAST matters.
17
Speedup from BLAST queries
19
Observations from BLAST ADORE is robust. It can handle real, large application code. ADORE does not speed up all queries, since the code is already running quite efficiently on Itanium systems. It adds about 1-2% of profiling and optimization overhead. ADORE does speed up one long query by 30%. It is difficult to further improve performance of BLAST by static compilers.
20
Future Direction of ADORE Show more performance on more real applications Make ADORE more transparent –Compiler independent –Exception handling Study the impact of compiler annotations Study architectural/Micro-architectural support for ADORE
21
ADORE Group Professors –Prof. Wei-Chung Hsu –Prof. Pen-Chung Yew –Dr. Bobbie Othmer Graduate Students –Howard Chen –Jiwei Lu –Jinpyo Kim –Sagar Dalvi –Rao Fu –WeiChuan Dong –Abhinav Das –Dwarakanath Rajagopal –Ananth Lingamneni –Vijayakrishna Griddaluru –Amruta Inamdar –Aditya Saxena
23
Summary Dynamic Binary Optimization customizes performance delivery. The ADORE project at U. of Minnesota is a research dynamic binary optimizer. It demonstrates a good performance potential. With architecture/micro-architecture and static compiler support, a future dynamic optimizer could be more effective, more adaptive and more applicable.
24
Conclusion Be Adaptive !! Be Dynamic !!
25
Dynamic Translation Fast Simulation –SimOS (Stanford), SHADE (SUN) Migration –DAISY, BOA (IBM), Virtual PC, ARIES (HP), Crusoe (Transmeta) Internet applications –Java HotSpot, MS dot NET Performance Tools (dynamic instrumentation) –Paradyn and EEL (UW), Caliper (HP) Optimization –Dynamo, Tinker (NCSU), Morph (Harvard), DyC (UW)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.