1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Program Representations. Representing programs Goals.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Code Compaction of an Operating System Kernel Haifeng He, John Trimble, Somu Perianayagam, Saumya Debray, Gregory Andrews Computer Science Department.
Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.
Presented By: Krishna Balasubramanian
1 Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta The University of Arizona.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Trace-based Just-in-Time Type Specialization for Dynamic Languages Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
A Comparison of Online and Dynamic Impact Analysis Algorithms Ben Breech Mike Tegtmeyer Lori Pollock University of Delaware.
Program Representations Xiangyu Zhang. CS590F Software Reliability Why Program Representations  Initial representations Source code (across languages).
Hash Table indexing and Secondary Storage Hashing.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
ISBN Chapter 9 Subprograms. Copyright © 2006 Addison-Wesley. All rights reserved.1-2 Introduction Two fundamental abstraction facilities.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
Hardware-Software Interface Machine Program Performance = t cyc x CPI x code size X Available resources statically fixed Designed to support wide variety.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Structured Data Types and Encapsulation Mechanisms to create new data types: –Structured data Homogeneous: arrays, lists, sets, Non-homogeneous: records.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
VPC3: A Fast and Effective Trace-Compression Algorithm Martin Burtscher.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
5.3 Machine-Independent Compiler Features
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Chapter 12 Recursion, Complexity, and Searching and Sorting
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
Florida State University Automatic Tuning of Libraries and Applications, LACSI 2006 In Search of Near-Optimal Optimization Phase Orderings Prasad A. Kulkarni.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.
Computer Organization Instructions Language of The Computer (MIPS) 2.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Department of Electrical & Computer Engineering
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
A Practical Stride Prefetching Implementation in Global Optimizer
Ann Gordon-Ross and Frank Vahid*
Phase Capture and Prediction with Applications
CS 201 Compiler Construction
Arrays .
Lecture 4: Instruction Set Design/Pipelining
Dynamic Binary Translators and Instrumenters
CS 201 Compiler Construction
Presentation transcript:

1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona

2 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Control Flow and Dependence Traces  Control Flow Traces Sequence of basic blocks. Identification of hot paths.  Path Sensitive Instruction Scheduling and Optimization.  Path Prediction and Instruction Fetching.  Dependence Traces Capture data dependences.  Flow from a definition to a use. Data Speculative Optimizations for Itanium. Computation of Dynamic Slices.

3 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Control Flow and Dependence Traces  Control Flow Traces are smaller than Dependence Traces and can be compressed well. Average size for Spec 2K benchmarks is 179 MB. Compression Factor  Sequitur – 681  VPC – 442  Dependence Traces are large and do not compress as well as Control Flow Traces. Average size for Spec 2K benchmarks is 565 MB. Compression Factor  Sequitur – 1.31  VPC – 5.8 Is there an alternative trace representation ?

4 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Our Approach  Extended Control Flow Trace – Unified Trace Representation. Capture both control flow and dependence information. The data dependences are embedded as control flow.  The unified trace is smaller than control flow + dependence traces.  Our compressed unified trace is also smaller than the compressed control flow + compressed dependence traces.

5 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Goals in Designing the eCF  The dependence can be recovered from the Control Flow. X = _ = X *p = _  The dependence can now not be recovered due to possible aliasing.  Additional Control Flow can capture the dependence = X If p==&X 5 6

6 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Cost of Capturing Dependences  No-cost capture For these dependences, no disambiguation checks are needed.  Fixed cost capture The number of disambiguation checks needed is a constant.  Variable cost capture. The number of disambiguation checks varies.

7 S. Tallam, R. Gupta, and X. Zhang PACT 2005 No Cost Capture  All instances of the dependence can be recovered from the control flow trace.

8 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Fixed Cost Capture  A single disambiguation check is sufficient to capture this dependence. Single Check

9 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Variable Cost Capture  The instances of the dependence can be caused by any instance of the definition statement. Multiple Checks

10 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Cost of Instrumentation and Trace Compressibility  Reducing the number of checks Reducing the size of the generated trace. Reduction in run-time overhead.  Improving the Compressibility Similar Control Flow Signatures.

11 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Two Phased Approach  Conservative nature of Static Pointer Analysis. Too many potential dependences per use.  Two phased Approach Filtering Phase  Find all dependences exercised. Profiling Phase  Add disambiguation checks only for those dependences exercised.

12 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Binary Search vs. Linear Search  Track the last definition and instance of every write to a memory address.  Search the address array using binary search instead of linear search.

13 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Optimizing Trace Length and Compressibility

14 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Experimental Results  Implementation on the Microsoft Phoenix RDK.  Spec 2K benchmark binaries were rewritten to obtain instrumented versions. Easy to implement using Phoenix.  Intermediate representation was low-level x86 instruction set. Split dependences into register and memory. Register dependences are always recoverable from control flow trace. Memory dependences were recovered using our approach.

15 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Register and Memory dependences  A Significant (76 %) of dependences (register) can be recovered from the control flow trace

16 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Uncompressed Trace Sizes  The unified trace is 62 % of the size of Control Flow + Dependence Trace Cont. + Dep. Unified Ratio

17 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Sequitur Compressed Cont. + Dep.UnifiedRatio  The compressed unified trace is 4 % of the size of compressed Control Flow + Dependence Trace

18 S. Tallam, R. Gupta, and X. Zhang PACT 2005 VPC Compressed Ratio UnifiedCont. + Dep.  The compressed unified trace is 21 % of the size of compressed Control Flow + Dependence Trace

19 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Memory Dependence Types  30 % of dependences can be recovered at no cost.

20 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Address Comparisons  Binary Search reduces the address comparisons by 4 orders of magnitude.

21 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Run-time Overhead  There is a 20 % increase in run-time overhead in collecting the unified trace.

22 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Conclusions  We have designed an extended control flow trace that captures both control flow and data dependence history.  The key to the unified trace is the ability to convert memory data dependences into control flow. The resulting unified trace is smaller than the combined control flow + dependence trace. The run-time overhead increases by 20 %. Our Thanks to Hoi Vo of Microsoft Corporation and the Phoenix Compiler Infrastructure Group.