Phoenix: a framework for Code Generation, Optimization and Program Analysis Andrew Pardoe Phoenix team

Slides:



Advertisements
Similar presentations
The Phoenix Compiler and Tools Framework
Advertisements

SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
Program Representations. Representing programs Goals.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
ACM Southeast Conference Melbourne, FL March 11, 2006 Phoenix-Based Clone Detection using Suffix Trees Robert Tairas
.NET Framework Overview Pingping Ma Nov 16 th, 2006.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Cpeg421-08S/final-review1 Course Review Tom St. John.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Software Optimization and Analysis Framework Phoenix By Joel Messer.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Introducing the Common Language Runtime for.NET. The Common Language Runtime The Common Language Runtime (CLR) The Common Language Runtime (CLR) –Execution.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Introducing the Common Language Runtime. The Common Language Runtime The Common Language Runtime (CLR) The Common Language Runtime (CLR) –Execution engine.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
1 An Introduction to Visual Basic Objectives Explain the history of programming languages Define the terminology used in object-oriented programming.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
CHAPTER 1: INTORDUCTION TO C LANGUAGE
LLVM Developed by University of Illinois at Urbana-Champaign CIS dept Cisc 471 Matthew Warner.
Intro to dot Net Dr. John Abraham UTPA – Fall 09 CSCI 3327.
Precision Going back to constant prop, in what cases would we lose precision?
A Free sample background from © 2001 By Default!Slide 1.NET Overview BY: Pinkesh Desai.
Introduction 01_intro.ppt
Using Microsoft Phoenix in Education and Research Dragan Bojić University of Belgrade
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
Lecture Roger Sutton CO530 Automation Tools 5: Class Libraries and Assemblies 1.
AutoHacking with Phoenix Enabled Data Flow Analysis Richard Johnson |
Module 1: Introduction to C# Module 2: Variables and Data Types
Andy Ayers Microsoft VC++
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Phoenix John LeforShahrokh Mortazavi Microsoft ResearchDeveloper Division.
Lecture 1 Programming in C# Introducing C# Writing a C# Program.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
ICD-C Compiler Framework Dr. Heiko Falk  H. Falk, ICD/ES, 2008 ICD-C Compiler Framework 1.Highlights and Features 2.Basic Concepts 3.Extensions.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
Architecture for a Next-Generation GCC Chris Lattner Vikram Adve The First Annual GCC Developers'
Joe Hummel, the compiler is at your service Chicago Code Camp 2014.
tom perkins1 XML Web Services -.NET FRAMEWORK – Part 1 CHAPTER 1.1 – 1.3.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Intro to dot Net Dr. John Abraham UTPA CSCI 3327.
PRIOR TO WEB SERVICES THE OTHER TECHNOLOGIES ARE:.
12/18/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Winter 2008.
Chuck Mitchell Senior Architect, Phoenix Product Unit Microsoft Corporation.
Joe Hummel, the compiler is at your service SDC Meetup, Sept 2014.
Object Oriented Software Development 4. C# data types, objects and references.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Joe Hummel, the compiler is at your service Chicago Coder Conference, June 2016.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Introducing the Microsoft® .NET Framework
.NET Omid Darroudi.
In-situ Visualization using VisIt
Introduction to .NET Framework Ch2 – Deitel’s Book
Workshop in Nihzny Novgorod State University Activity Report
CS360 Windows Programming
Module 1: Getting Started
Intermediate Representations Hal Perkins Autumn 2011
Running C# in the browser
IS 135 Business Programming
Presentation transcript:

Phoenix: a framework for Code Generation, Optimization and Program Analysis Andrew Pardoe Phoenix team

What is Phoenix?  Phoenix is Microsoft’s next-generation, state of the art infrastructure for program analysis and transformation  We wanted to…  Develop an industry-leading compilation and tools framework  Foster a rich ecosystem for  Academic  Research  Industry  With an infrastructure that is robust, retargetable, extensible, configurable and scalable  Phoenix is built on C++/CLI and compiles either as managed or native code

Building a program with C++/CLI  Microsoft C++ compiler  Input: program source code  Ouput: COFF object file  COFF files are linked with system libraries into PEs Driver (CL) C++ Source Frontend (C1) Backend (C2) Obj File

Roles of C1 (C1xx) and C2 C1 or C1xx C2  Preprocessing  Tokenization  Parsing  Semantic processing  CIL emission *  Types and symbol debug info  Metadata for managed code  * CIL reading  Program analysis  Optimization  Lowering to target  COFF emission  Source level debug info

Why we built Phoenix  Code generation technology now appears in many different forms  Large-scale optimizers (PreJIT or C++’s LTCG)  Fast code generation (.NET’s JIT, C++ debug mode, C#)  Custom code generators (fast conditional breakpoints, SQL expression optimizers)  Code generators in Microsoft target many different computer architectures  PC platforms (x86, x64, IA64)  Game consoles (x86, PPC)  Handheld devices (ARM)

And another set of reasons…  Microsoft builds sophisticated analysis tools  VS 2005’s C++ compiler contains an /analyze switch to perform static analysis for code defects  The.NET coding guidelines are enforced by FxCop  We have tools for defect, security and race detection  These tools are often developed in a manner that work for one specific product. This limits…  Retargeting the tool for other applications  Ability to adopt the best-of-breed technology  Ability to move forward as technology changes

Why the rest of the world needs Phoenix 一  Research  Research often spends too much time handling routine work instead of exploring the novel ideas that inspired the research  If research doesn’t build on a world-class framework it often cannot handle real-world problems  Industry  Much effort is spent on deciphering poorly documented formats and interfaces (Microsoft’s CIL or PE file formats)  There is an inherent fragility in working without specifications or promises of future compatibility  Industry “mistakes” end up costing Microsoft as well  Academic  Attempts to provide common infrastructures have had limited success in the past  By using Phoenix, educators can start with big problems and leave the routine work to us

Phoenix Infrastructure.Net CodeGen Runtime JITs Pre-JIT OO and.Net optimizations Native CodeGen Advanced C++/OO Optimizations FP optimizations OpenMP Retargetable “Machine Models” ~3 months: -Od ~3 months: -O2 Chip Vendor CDK ~6 month ports Sample port + docs Key ports (Xscale) done at msft Academic RDK Full sources (future) Managed API’s IP as DLLs Docs MSR & Partner Tools Built on Phoenix API’s Both HL and LL API’s Managed API’s Program Analysis Program Rewrite MSR Adv Lang Language Research Direct xfer to Phoenix Research Insulated from code generation AST Tools Static Analysis Tools Next Gen Front-Ends R/W Global Program Views

Key features of Phoenix  Written in C++ but usable by any.NET language  Samples provided in C# and C++/CLI  Phase and Plug-In model for third-party extensions to:  C++ compiler backend, JIT/PreJIT  Static analysis tools, binary analysis and manipulation  Plug-Ins and extensions to the Phoenix architecture  Single, strongly-typed, explicit dataflow/control flow IR used throughout all phases of the framework  IR and Type system are capable of processing native and managed code  Strong inter-phase consistency checking

DelphiCobol HL OptsLL OptsCode GenHL OptsLL Opts HL Opts Native Image C# Phoenix Core AST IR Syms Types CFGraph SSA Dataflow Alias EH Readers Writers Xlator Formatter Browser Phx APIs Profiler Obfuscator Visualizer Security Checker Refactor Lint VB C++ IL.NET assembly C++ C++AST PreFast Profile Eiffel C++ Phx AST Lex/Yacc Tiger Code Gen CompilersTools

CLR JIT CLR PreJITer VC++ VC++ BE The Phoenix Building Blocks Core Structures And Utilities High Level Optimizations Low Level Optimizations Machine Abstractions Dynamic Tools Locaity opts Static Tools Analysis

Phoenix Architecture  Core set of extensible classes to represent  IR (intermediate representation of code stream)  Symbols, Types, Function units, Basic blocks, Graphs, Trees, Aliasing information  Layered set of analysis and transformation components  Data flow analysis, Loop analysis, Alias analysis, Dead code removal, Redundant code detection  Global optimizations built on reusable analysis lattices  Common input/output library for binary formats  PE, LIB, OBJ, CIL, MSIL, PDB  Phoenix both reads and writes binary formats

Simple example void main (int argc, char** argv) { char * message; if (argc > 1) message = “Hello, world!\n”; else message = “Goodbye, world!\n”; printf (message); }

Resulting Phoenix IR 二

View inside a Phoenix-based C2 AST HIR MIRLIREIR CIL Reader Type Checker MIR Lower SSA Const SSA Dest Canon Addr Modes Lower Reg Alloc EH Lower Stack Alloc Frame Gen Switch Lower Block Layout Flow Opts Encode Lister C2C1 CILCIL SOURCESOURCE OBJECTOBJECT

Types of IR  High-level IR: Architecture and runtime independent. Object model instructions, array indices, full aliasing  Mid-level IR: Architecture independent, runtime dependent. Lowered to calls and address arithmetic  Low-level IR: Architecture and runtime dependent. Lowered to machine instructions  Encoded IR: Binary format. Lowered to encoded data instructions  IRs contain Instructions and Operands of various types at each IR level

IR states during compilation  Phases transform IR either within a state or from one state to a contiguous state  For example, lower phase transforms MIR into LIR. Optimizations usually work within a single phase. Abstract Concrete Lowering Raising ASTHIRMIRLIREIR

Extending a Phoenix-based compiler  The VC++ optimizer is just a Phoenix client  All Phoenix clients can host Plug-Ins  Plug-Ins can  Add new components  Extend existing components  Reconfigure clients  Extensibility relies upon  Reflection  Events and delegates

Component extensibility  Most objects in the system support observers by deriving from the Phoenix class Extensible Object  Observer classes can register delegates so that they are notified when the host object undergoes certain events. For example, if the host object is copied it will notify registered delegates  Phoenix provides a standard plug-in discovery and registration mechanism  Plug-ins can reconfigure the client, such as replacing the register allocator  Plug-ins can also use Phoenix’s analyses to do their own analyses and transformations

Extensibility example – birth tracking // Called from Instruction ctor PlugIn::NewInstructionEventHandler ( Phx::IR::Instruction ^ instruction ) { InstructionBirthExtensionObject ^ extensionObject = gcnew InstructionBirthExtensionObject(); extensionObject->BirthPhase = instruction ->FunctionUnit->Phase; instruction->AddExtensionObject(extensionObject); } // Called from Instruction dtor void PlugIn::DeleteInstructionEventHandler ( Phx::IR::Instruction ^ instruction ) { InstructionBirthExtensionObject ^ extensionObject = InstructionBirthExtensionObject::Get(instruction); instruction->RemoveExtensionObject (extensionObject); } // Attach a note to each instruction with the birth // phase for reference later public ref class InstructionBirthExtensionObject : public Phx::IR::InstructionExtensionObject { public: property Phx::Phases::Phase ^ BirthPhase; property System::String ^ BirthPhaseText { System::String ^ get () { if (BirthPhase != nullptr) { return BirthPhase->NameString; } return ""; } };

Plug-In VS Integration 三  Plug-Ins can be created via Visual Studio Wizards  RDK is downloadable and works with free VS Express Editions (though you probably want the VS Team System Edition for your work : )

Example: Unitialized local detection  We would like to warn the user that ‘x’ is not initialized before use  To do this we need to perform dataflow analysis  We’ll use a plug-in to add this phase to the existing Phoenix-based C2 int foo() { int x; return x; }

May and Must examples message may be used before it is defined message must be used before it is defined void main(…) { char * message; if (…) message = “Hello”; printf(message); } void main(…) { char * message; char * other; if (…) other = “Hello”; printf(message); }

IR for detecting uninitialized locals

Detecting an uninitialized use  For each local variable v  Examine all paths from the entry of the method to each use of v  If on every path v is not initialized before the use  v must be used before it is defined  If there is some path where v is not initialized before the use  v may be used before it is defined  Classic solution is to build a control flow graph and solve the data flow problem.  State is “unknown” at the start of each block. Transfer states between blocks and combine them as you traverse the control flow graph

Code sketch using classic dataflow bool changed = true; while (changed) { for each (Phx::Graphs::BasicBlock block in function) { STATE ^ inState = inStates[block]; bool firstPred = true; for each(Phx::Graphs::BasicBlock predecessorBlock in block->Predecessors) { STATE ^ predecessorState = outStates[predecessorBlock]; inState = meet(inState, predecessorState); } inStates[id] = inState; STATE ^ newOutState = gcnew STATE(inState); for each(Phx::IR::Instruction ^ instruction in block->Instructions) { for each (Phx::IR::Operand ^ operand in instruction->DestinationOperands) { Phx::Symbols::LocalVariableSymbol^ localSymbol = operand->Symbol->AsLocalVariableSymbol; newOutState[localSymbol] = destination(newOutState[localSymbol]); } STATE ^ outState = outStates[id]; bool blockChanged = ! equals(newOutState, outState); if (blockChanged) { changed = true; outStates[id] = newOutState; } Update input state Compute output state Check for convergence

Can we make this easier?  Dataflow solution computes the state for the entire graph, even at places where v is never referenced  An alternate model is known as “Static Single Assignment” form, or SSA. It directly connects definitions and uses.  Phoenix uses SSA and builds flow graphs when necessary  We can rewrite this code letting Phoenix do most of the routine work

Code sketch using Phoenix 四 for each (Phx::IR::Operand ^ destinationOperand in Phx::IR::Operand::IteratorDestinations(firstInstruction)) { if (destinationOperand->IsMemoryModificationReference) { for each (Phx::IR::Operand ^ useOperand in Phx::IR::Operand::IteratorUse(destinationOperand)) { if (useOperand->Instruction->Opcode != Phx::Common::Opcode::Phi && useOpnd->IsVariableOpnd) { Phx::Symbols::Symbol ^ symbolUse = useOperand-> AsVariableOpnd->Symbol; if (symbolUse != nullptr && !mustList.Contains(symbolUse)) { mustList.Add(symbolUse); }

Uninitialized local plug-in  Plug-in is loaded at runtime by Phoenix-based C2 UninitializedLocal.cpp C++/CLI UninitialzedLocal.dll Test.cpp C1 Test.obj Phx-C2

Phoenix C2 with our plug-in added  This complete plug-in is provided as a sample in the Research Development Kit  It is only ~400 lines of code to add a key warning to the C2 compiler  Other types of checking can be added just as easily  A demonstration of the warnings being emitted:

Phoenix PE Reading  Phoenix can read and write PE files directly  You can implement your own compiler or linker  You can create post-link tools for analysis, instrumentation or optimization  Binaries can be read in, raised into IR, changed and rewritten as new, working binaries  Phoenix Explorer is only ~800 lines of code on top of the Phoenix binary reading-writing library

Phoenix explorer is like ILDasm to IR

Binary rewriting with Phoenix  mtrace utility injects tracing code into managed applications  You don’t need the source code to do this (you do need the PDB)  mtrace shows functions being entered and exited

How do I get Phoenix?  Early access RDKs are available to selected universities  Sample projects include aspect oriented programming, code obfuscation, profiling  Contact for Academic early access  Early access CDK is available to selected industry partners  Contact for commercial early access  Phoenix RDK/CDKs release about every 6 months  Phoenix will be the next MS compiler backend  We build the next-generation Windows every night

More information 五 