The SGI Pro64 Compiler Infrastructure

Slides:



Advertisements
Similar presentations
CSC 4181 Compiler Construction Code Generation & Optimization.
Advertisements

CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic B: Open Research Compiler José Nelson Amaral
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Course Outline Traditional Static Program Analysis Software Testing
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08.
Modern Compiler Internal Representations Silvius Rus 1/23/2002.
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
Introduction to Program Optimizations Chapter 11 Mooly Sagiv.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic H: SSA for Predicated Code José Nelson Amaral
Case Studies of Compilers and Future Trends Chapter 21 Mooly Sagiv.
Hardware-Software Interface Machine Program Performance = t cyc x CPI x code size X Available resources statically fixed Designed to support wide variety.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
Loop Induction Variable Canonicalization. Motivation Background: Open64 Compilation Scheme Loop Induction Variable Canonicalization Project Tracing and.
Eliminating Memory References Joshua Dunfield Alina Oprea.
LLVM Developed by University of Illinois at Urbana-Champaign CIS dept Cisc 471 Matthew Warner.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
ICD-C Compiler Framework Dr. Heiko Falk  H. Falk, ICD/ES, 2008 ICD-C Compiler Framework 1.Highlights and Features 2.Basic Concepts 3.Extensions.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
University of Maryland The DPCL Hybrid Project James Waskiewicz.
LIPO: Feedback Directed Cross-Module Optimization
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Architecture for a Next-Generation GCC Chris Lattner Vikram Adve The First Annual GCC Developers'
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa, Toshiharu Nakaya, Masaki Kohama, Takeaki Fukuoka and Masahito Takahashi.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Unified Parallel C at LBNL/UCB Compiler Optimizations in the Berkeley UPC Translator Wei Chen the Berkeley UPC Group.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Open64 | The Open Research Compiler Ben Reinhardt and Cliff Piontek.
R-Verify: Deep Checking of Embedded Code James Ezick † Donald Nguyen † Richard Lethin † Rick Pancoast* (†) Reservoir Labs (*) Lockheed Martin The Eleventh.
Memory-Aware Compilation Philip Sweany 10/20/2011.
3/6/20161 WHIRL SSA: A New Optimization Infrastructure for Open64 Keqiao Yang, Zhemin Yang Parallel Processing Institute, Fudan University, Shanghai Hui.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Simone Campanoni LLVM Simone Campanoni
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
Global Register Allocation Based on
High-level optimization Jakub Yaghob
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Compiler Construction (CS-636)
Hierarchical Architecture
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.
The HP OpenVMS Itanium® Calling Standard
The SGI Pro64 Compiler Infrastructure
Intermediate Representations
Performance Optimization for Embedded Software
Optimizing Transformations Hal Perkins Autumn 2011
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
A Practical Stride Prefetching Implementation in Global Optimizer
Intermediate Representations
Optimizing Transformations Hal Perkins Winter 2008
Compiler Code Optimizations
Topic 5a Partial Redundancy Elimination and SSA Form
Open64 Release 4.0 High Performance Compiler for Itanium and x86 Linux Shin-Ming Liu, Hewlett Packard PLDI, June 2007.
Compiler Construction
HPC User Forum: Back-End Compiler Technology Panel
Introduction to Optimization
Peter Oostema & Rajnish Aggarwal 6th March, 2019
Presentation transcript:

The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI)

Acknowledgement The SGI Compiler Development Teams The MIPSpro/Pro64 Development Team University of Delaware CAPSL Compiler Team These individuals contributed directly to this tutorial A. Douillet (Udel) F. Chow (Equator) S. Chan (Intel) W. Ho (Routefree) Z. Hu (Udel) K. Lesniak (SGI) S. Liu (HP) R. Lo (Routefree) S. Mantripragada (SGI) C. Murthy (SGI) M. Murphy (SGI) G. Pirocanac (SGI) D. Stephenson (SGI) D. Whitney (SGI) H. Yang (Udel)

What is Pro64? A suite of optimizing compiler tools for Linux/ Intel IA-64 systems C, C++ and Fortran90/95 compilers Conforming to the IA-64 Linux ABI and API standards Open to all researchers/developers in the community Compatible with HP Native User Environment

Who Might Want to Use Pro64? Researchers : test new compiler analysis and optimization algorithms Developers : retarget to another architecture/system Educators : a compiler teaching platform

Outline Background and Motivation Part I: An overview of the SGI Pro64 compiler infrastructure Part II: The Pro64 code generator design Part III: Using Pro64 in compiler research & development SGI Pro64 support Summary

PART I: Overview of the Pro64 Compiler

Outline Logical compilation model and component flow WHIRL Intermediate Representation Inter-Procedural Analysis (IPA) Loop Nest Optimizer (LNO) and Parallelization Global optimization (WOPT) Feedback Design for debugability and testability

Logical Compilation Model driver (sgicc/sgif90/sgiCC) front end + IPA (gfec/gfecc/mfef90) back end (be, as) linker (ld) Src (.c/.C/.f) WHIRL (.B/.I) obj (.o) a.out/.so Data Path Fork and Exec

Components of Pro64 Code Generation Front end Interprocedural Analysis and Optimization Loop Nest Optimization and Parallelization Global Optimization Code Generation

Data Flow Relationship Between Modules -IPA Local IPA Main IPA LNO Lower to High W. .B Inliner gfec .I lower I/O gfecc (only for f90) WHIRL C .w2c.c f90 .w2c.h WHIRL fortran .w2f.f Take either path -O0 Lower all CG Very high WHIRL -phase: w=off High WHIRL Lower Mid W Main opt Mid WHIRL -O2/O3 Low WHIRL

Front Ends C front end based on gcc C++ front end based on g++ Fortran90/95 front end from MIPSpro

Intermediate Representation IR is called WHIRL Tree structured, with references to symbol table Maps used for local or sparse annotation Common interface between components Multiple languages, multiple targets Same IR, 5 levels of representation Continuous lowering during compilation Optimization strategy tied to level

IPA Main Stage Analysis Optimization (fully integrated) alias analysis array section code layout Optimization (fully integrated) inlining cloning dead function and variable elimination constant propagation

IPA Design Features User transparent No makefile changes Handles DSOs, unanalyzed objects Provide info (e.g. alias analysis, procedure properties) smoothly to: loop nest optimizer main optimizer code generator

Loop Nest Optimizer/Parallelizer All languages (including OpenMP) Loop level dependence analysis Uniprocessor loop level transformations Automatic parallelization

Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining Loop vector dependency info passed to CG Loop Fission Loop Fusion Loop Unroll and Jam Loop Interchange Loop Peeling Loop Tiling Vector Data Prefetching

Parallelization Automatic Directive based Array privatization Doacross parallelization Array section analysis Directive based OpenMP Integrated with automatic methods

Global Optimization Phase SSA is unifying technology Use only SSA as program representation All traditional global optimizations implemented Every optimization preserves SSA form Can reapply each optimization as needed

Pro64 Extensions to SSA Representing aliases and indirect memory operations (Chow et al, CC 96) Integrated partial redundancy elimination (Chow et al, PLDI 97; Kennedy et al, CC 98, TOPLAS 99) Support for speculative code motion Register promotion via load and store placement (Lo et al, PLDI 98)

Feedback Used throughout the compiler Instrumentation can be added at any stage Explicit instrumentation data incorporated where inserted Instrumentation data maintained and checked for consistency through program transformations.

Design for Debugability (DFD) and Testability (DFT) DFD and DFT built-in from start Can build with extra validity checks Simple option specification used to: Substitute components known to be good Enable/disable full components or specific optimizations Invoke alternative heuristics Trace individual phases

Where to Obtain Pro64 Compiler and its Support SGI Source download http://oss.sgi.com/projects/Pro64/ University of Delaware Pro64 Support Group http://www.capsl.udel.edu/~pro64 pro64@capsl.udel.edu