TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.

Slides:



Advertisements
Similar presentations
CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic B: Open Research Compiler José Nelson Amaral
Advertisements

Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Course Outline Traditional Static Program Analysis Software Testing
Xtensa C and C++ Compiler Ding-Kai Chen
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Program Representations. Representing programs Goals.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08.
Modern Compiler Internal Representations Silvius Rus 1/23/2002.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Cpeg421-08S/final-review1 Course Review Tom St. John.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
CSRD, University of Illinois at Urbana-Champaign 1 A Complete Compilation System.
Introduction to Program Optimizations Chapter 11 Mooly Sagiv.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Data-Flow Analysis (Chapter 11-12) Mooly Sagiv Make-up class 18/ :00 Kaplun 324.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction & Overview CS4533 from Cooper & Torczon.
Precision Going back to constant prop, in what cases would we lose precision?
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
5.3 Machine-Independent Compiler Features
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group.
LIPO: Feedback Directed Cross-Module Optimization
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Lexical analyzer Parser Semantic analyzer Intermediate-code generator Optimizer Code Generator Postpass optimizer String of characters String of tokens.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Retargetting of VPO to the tms320c54x - a status report Presented by Joshua George Advisor: Dr. Jack Davidson.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Memory-Aware Compilation Philip Sweany 10/20/2011.
LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Chapter 1 Introduction.
Introduction to Compiler Construction
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Chapter 1 Introduction.
课程名 编译原理 Compiling Techniques
Compiler Construction
Intermediate Representations
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
A Practical Stride Prefetching Implementation in Global Optimizer
STUDY AND IMPLEMENTATION
Intermediate Representations
Optimizing Transformations Hal Perkins Winter 2008
Topic 5a Partial Redundancy Elimination and SSA Form
The SGI Pro64 Compiler Infrastructure
Presentation transcript:

TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000

Outline IA-64™ Features Organization and infrastructure Components and technology Where we are going Opportunities for cooperation

IA-64 Features It is all about parallelism –at the process/thread level for programmer –at the instruction level for compiler Explicit parallel instruction semantics Predication and Control/Data Speculation Massive Resources (registers, memory) Register stack and its engine Software pipelining support Memory hierarchy management support

Structure Logical compilation model Base compilation model IPA compilation model DSO structure

Logical Compilation Model

Base Compilation Model

IPA Compilation Model

DSO Structure

Intermediate Representation IR is called WHIRL Common interface between components Multiple languages and multiple targets Same IR, 5 levels of representation Continuous lowering as compilation progresses Optimization strategy tied to level

Components Front ends Interprocedural analysis and optimization Loop nest optimization and parallelization Global optimization Code generation

Front ends C front end based on gcc c++ front end based on g++ Fortran90/95 front end

IPA Two stage implementation –Local: gather local information at end of front end process –Main: analysis and optimization

IPA Main Stage Two phases in main stage Analysis: PIC symbol analysis Constant global identification Scalar mod/ref Array section Code layout for locality Optimization: Inlining Intrinsic function library inlining Cloning for constants, locality Dead function, variable elimination Constant propagation

IPA Engineering User transparent Additional command line option (-ipa) Object files (*.o) contain WHIRL IPA in ld invokes backend Integrated into compiler Provides information to loop nest optimizer, global optimizer, and code generator Not disabled by normal.o or DSO object Can analyze DSO objects

Loop Nest Optimizer/Parallelizer All languages Loop level dependence analysis Uniprocessor loop level transformations OpenMP Automatic parallelization

Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining Fission Fusion Unroll and jam Loop interchange Peeling Tiling Vector data prefetching

Parallelization Automatic Array privatization Doacross parallelization Array section analysis Directive based OpenMP Integrated with automatic methods

Global optimization Static Single Assignment is unifying technology Industrial strength SSA All traditional optimizations implemented –SSA-preserving transformations –Deals with aliasing and calls –Uniformly handles indirect loads/stores Benefits over bit vector techniques –More efficient: setup and use –More natural algorithms => robustness –Allows selective transformation

Code Generation Inner loops IF conversion Software pipelining Recurrence breaking Predication and rotating registers Elsewhere Hyperblock formation Frequency based block reordering Global code motion Peephole optimization

Technology Target description tables (targ_info) Feedback Parallelization Static Single Assignment Software pipelining Global code motion

Target description tables Isolate machine attributes from compiler code Resources: functional units, busses Literals: sizes, ranges, excluded bits Registers: classes, supported types Instructions: opcodes, operands, attributes, scheduling, assembly, object code Scheduling: resources, latencies

Feedback Used throughout the compiler Instrumentation can be added at any stage Explicit instrumentation data incorporated where inserted Instrumentation data maintained and checked for consistency through program transformations

SSA Advantages Built-in use-def edges Sparse representation of data flow information Sparse data flow propagation based on SSA graph Linear or near-linear algorithms Every optimization is global Transform one construct at a time, customize to context Handle second order effects

SSA as IR for optimizer SSA constructed only once at set-up time Use-def info inherently part of SSA Use only optimization algorithms that preserve SSA form: – Transformations do not invalidate SSA info – Full set of SSA-preserving algorithms No SSA construction overhead between phases: – Can arbitrarily repeat a phase for newly exposed optimization opportunities Extended to uniformly handle indirect memory references

Software Pipelining Technology evolved from Cydra compilers Powerful preliminary loop processing Effective minimization of loop overhead code Highly efficient backtracking for scheduling Integrated register allocation, interface with CG Integrated with LNO loop nest transformations

Global Code Motion Moves instructions between basic blocks Purpose: balance resources, improve critical paths Uses program structure to guide motion Uses feedback or estimated frequency to prioritize motion No artificial barriers, no exclusively- optimized paths

Where are we going? Open source compiler suite Target description for IA-64 Available via usual Linux distributions and Beta version in June MR version when Intel ships systems OpenMP for c/c++ (later) OpenMP extensions for NUMA (later)

Areas for collaboration Target descriptions for other ISAs –real or prototype Additional optimizations Generate information for performance analysis tools Extensions to OpenMP Surprise me

The solution is in sight. 